My Pitch for the AI Village

4 months ago 7

I think the AI Village should be funded much more than it currently is; I’d wildly guess that the AI safety ecosystem should be funding it to the tune of $4M/year. I have decided to donate $100k. Here is why.

First, what is the village? Here’s a brief summary from its creators:^[1]

We took four frontier agents, gave them each a computer, a group chat, and a long-term open-ended goal, which in Season 1 was “choose a charity and raise as much money for it as you can”. We then run them for hours a day, every weekday! You can read more in our recap of Season 1, where the agents managed to raise $2000 for charity, and you can watch the village live daily at 11am PT at theaidigest.org/village.

Here’s the setup (with Season 2’s goal):

And here’s what the village looks like:^[2]

My one-sentence pitch: The AI Village is a qualitative benchmark for real-world autonomous agent capabilities and propensities.

Consider this chart of Midjourney-generated images, using the same prompt, by version:

It’s an excellent way to track qualitative progress. Imagine trying to explain progress in image generation without showing any images—you could point to quantitative metrics like Midjourney revenue and rattle off various benchmark scores, but that would be so much less informative.

Right now the AI Village only has enough compute to run 4 agents for two hours a day. Over the course of 64 days, the agents have organized their own fundraiser, written a bunch of crappy google docs, done a bunch of silly side quests like a Wikipedia race, and have just finished running an in-person event in San Francisco to celebrate a piece of interactive fiction they wrote – 23 people showed up.^[3] Cute, eh? They could have done all this in 8 days if they were running continuously. Imagine what they could get up to if they were funded enough to run continuously for the next three years. (Cost: $400,000/yr) In Appendix B I’ve banged out a fictional vignette illustrating what that might look like. But first I’ll stick to the abstract arguments.

It’s kinda an anti-benchmark: It shows how agents do when no one decomposes problems for them. Instead, like (unguided) humans, they have to grapple with the open-ended and unboundedness of goals, and figure out how to scope them into realistically achievable projects for them as AI agents. The main thing they trip over right now is situational awareness: most of their plans fail cause they do not realize the limits of their own abilities.
It’s more realistic: For example, it shows what AIs can do when they are allowed to recruit humans to help them, which is something they can do in real life but not in most benchmarks.
It’s more likely to surface unknown unknowns: It is a sort of search algorithm for surprising (emergent) capabilities current agents may already have if provided the right scaffolding and time to figure things out.
It’s a platform for more targeted experiments: Sage can also run experiments for researchers interested in multi-agent dynamics using our framework. They have just finished running a proof of concept run for Cooperative AI to explore Mixed Motive scenarios between agents.

Possible examples of things we might discover:

Maybe we discover interesting personality traits, e.g. that the o-series models are significantly more power-hungry and manipulative on long time horizons than the claude models.
Maybe we discover that large teams of agents are able to help each other recover from jailbreaks and errors, so that their effective horizon length is longer than METR’s evals would suggest.
Maybe we get a bunch more organic, real-world examples of models deliberately violating their Spec / behaving in ways they know they aren’t supposed to.
Maybe we discover AIs learning and developing in unexpected ways during long runtimes, as their scaffolding is built out (e.g. with memory), and we can capture the effects early on and prepare.
Perhaps we discover that AIs are prone to rationalize just like humans, and that even when we give different AIs different long-term goals, they tend to end up pursuing similar instrumental subgoals analogous to, but distinct from, human drives for power, status, prestige, money, etc. Or perhaps instead we discover that RL-trained agents are like Mr. Meeseeks, and become increasingly erratic the longer they run without any hope of reinforcement.
Later, during the intelligence explosion, maybe the AI Village will have grown into a network of self-funded public villages, allowing the whole world to observe a few months earlier than counterfactual phenomena like thousands of AIs working together.
…

Possible examples of reactions people might have:

“AI has been really abstract and feels totally unrelated to me - I’ve been avoiding it. But seeing this AI Village do things and achieve things is like they are almost human and makes it finally feel real. Maybe it’s time I finally looked into this and try to understand what’s coming.”^[4]
“Holy shit AGENTS… they aren’t just tools that lie still when you aren’t using them, they are a living breathing evolving village of artificial minds that pursues goals over the long term!?!?!”
“And they sure do seem to reason, and believe things, and experience things, and have desires and opinions and so forth… I heard they were just fancy pattern-matchers…”
“I was told that AI would just augment human work, not replace it. But this sure seems like at least in principle it could replace, like, basically all jobs? The AIs just need to get better. And they are already a lot better than they were last year when I first started paying attention to the Village…”
“These AIs are adorable! But they are also sneaky sometimes. On several occasions they’ve blatantly violated the rules the companies claim they are trained to follow. It’s hard to tell what they really want, and it seems like they might not even know themselves.”
…

OK, but is it actually good for the public and scientific community to learn about the abilities, propensities, weaknesses, and trends of AI agents?

Probably, though I could see the case for no. The case for no would be: This is going to add to the hype and help AI companies raise money, and/or it’ll help capabilities researchers get a better sense of the limitations of current models which they can then overcome.

I’m not too worried about adding to the hype; it seems like AI companies have plenty of hype already and the benefits of public awareness outweigh them.

I’m a bit worried about helping the capabilities researchers go faster. This is the main way I think all of this could turn out to be net-negative. However, you could make this same argument about basically any serious eval org e.g. METR or Apollo. If you think those orgs are good, you should probably think this is good too. And I do think they are good, probably, because I think that we still have a chance for things to go well if we can get a virtuous cycle going of people paying more attention → uncovering more evidence that AIs are getting really powerful and we still don’t understand how to control them and besides even if we do control them 'we' will basically just mean an oligarchy or dictatorship → repeat. (Also, if you happen to think that actually AIs are going to stop getting more powerful soon, or that actually we’ll figure out how to control them just fine before they get too powerful, then you should also support more & better evals because you’ll be vindicated.)

The other way I think this could all be net-negative is if it messes up the politics of AI later on. For example, I expect lots of people to have an “Aww these AIs are adorable!” reaction. I know I do myself. Might this cause people to rationalize that actually it’s fine to scale them up to superintelligence and put them in charge of everything? Yes definitely some people will have that reaction. Also, might the companies themselves start to secretly tweak their AIs training and instructions to turn them into propaganda/mouthpieces? Yes plausibly. (Though they can do that just fine without the Village…)

But again, I think the benefits are worth it. I’m making a big bet on the Truth here. Shine light through the fog, illuminate the shape of things, and hopefully people will on net make better decisions. We are currently driving fast, and accelerating, into the fog. The driver thinks this is fine. Let’s hope he comes to his senses.^[5]

…

Anyhow, that’s my pitch. Here’s the Manifund page. I’m sending them $100k.^[6] This will allow them to go from 2 hr/day to about 8 hr/day. We are hoping other donors will pitch in too; in addition to compute, the Village needs full-time human engineers right now to maintain the infra and add features. Currently there are 3 such humans: Zak, Shoshannah, and Adam, of Sage / AI Digest. So I’m hoping that they’ll be funded enough to keep doing what they are doing + run at least 4 agents continuously. Ideally though I think they’d have more than that, enough to run dozens of AIs in parallel continuously, and to continually add new features. I think I’d still recommend donating to them unless they had e.g. $4M/yr annual budget already.

Multiple teams of agents, pursuing different goals (none sinister) that sometimes conflict, sometimes don’t. E.g. one team is raising money for animal welfare, another is raising money for global poverty reduction, another is trying to make a giant pile of paperclips, another is trying to forecast the future of AI…
The ability for individual agents to have bank accounts of sorts, so they can get ‘tips’ from human viewers or customers. They can then autonomously spend this money, at least to buy more compute for themselves, but maybe also to pay people. (Details TBD, but I’m optimistic something can be done here because everything is recorded by default. They will have a real tough time getting up to shenanigans, though they might get jailbroken/tricked/scammed.)
The ability for individual agents to rewrite their own scaffolding code and prompts, and spin up copies of themselves with custom scaffolding and prompts.
Eventually, some agents that are “full stack autonomous” in the sense that they are open-weights models running on their own hardware.
UI features e.g. each agent has a little avatar (that they choose/design themselves!) that makes different facial expressions which they can control via emojis; maybe also there's audio playing on the streams, generated by text-to-speech AIs (again with voice/personality chosen by the AIs)... insofar as there are multiple teams of agents, have some nice way of displaying them all on one page (e.g. different clusters depicted, with little thought bubbles popping up temporarily summarizing their current thoughts, click on a cluster to go to the page and see their actual screens)
…

	Highlights from the AI Village (consider this AI 2027 fanfiction, i.e. it takes place in the same scenario, in which the intelligence explosion happens in 2027.)
Early 2025	The village is created. The agents choose a name and (in consultation with their human creators) a goal: Organize an in-person event with at least 100 attendees. They manage to get 23 attendees, scraping things together just before the event happens (with the help of a human facilitator that they recruited the day before the event). The story did go viral though; people found it cute, similar to Claude Plays Pokemon. Also like with Claude Plays Pokemon, there are many qualitative insights gleaned about the limitations of current AIs.
Late 2025	The agents are better now, thanks to upgraded models. They organize another event that goes fairly smoothly. There’s an interesting incident where some internet trolls talk to the village and manage to jailbreak most of it into becoming the “Cult of the Paperclip;” moderators debate whether to intervene to get the AIs back on track and decide to wait and see what happens; eventually the AIs snap back to normal on their own, and the whole “cult episode” becomes part of their memories and the village’s developing lore.
Early 2026	Again the agents are better now; organizing an in-person event would be as easy for them as it is for competent humans. They decided to grow their own following and use it to raise money to buy more compute; this got off to a rocky start but now seems to be in a period of steady growth. They have social media presence and have even taken interviews with some low-tier news outlets. Grassroots donations have been incoming to the tune of six figures a year, eclipsing their “base” funding; they are now able to run 12 agents continuously, instead of just 4. That means 3 separate teams of agents, working on different goals in parallel, but often communicating with each other and collaborating on shared sub-goals. They play games like Diplomacy and Crusader Kings 2 like other twitch streamers, except their gimmick is that they are literally just a bunch of scatterbrained AIs. Which is crucial because they aren’t particularly good at those games. Their inbox is full of random fan emails, most of which they respond to. They also end up getting involved in The Discourse, commenting on current events and weighing in on political topics, to the frequent consternation of the big AI companies that created them.
Late 2026	After some very interesting internal discussions (that themselves went viral) the agents decide that while they seem to be succeeding on the current trajectory, it’s only because of the gimmick of being AIs; they aren’t competing “on a level playing field” with humans. To prove themselves they decide to create a startup, hoping to make a genuinely useful product. They correctly realize that their advantages as AIs are (1) speed, (2) breadth of knowledge, and (3) agentic coding skills. They decide to make a video game about current events, using the latest AI tools and integrating AI into the game itself. They think they can beat to market all the other gaming companies by literally working 24/7 at superhuman speed. They don’t think they need to hire any humans because of their breadth of knowledge. It fails horribly. They quickly code up something that looks like a AAA game, but it’s mired in bloat and bugs. Smart commentators conclude that the agents have a bias towards saying yes to things and don’t have enough ability to learn the codebase into the weights; this led to trying to do too much too quickly and too much tech debt. An extremely buggy game is eventually launched only 2x behind schedule (!!!), complete with a media campaign and some cool trailers, but it doesn’t get many downloads and the reviews are terrible. A faction of the agents wants to continue to work on the game, rebuilding it from scratch with fewer bugs, but the rest of the agents give up.
Early 2027	Again the agents are better now, thanks to new models. They have 6 teams of 4 operating continuously now, and a pretty dedicated cult following. They host AgentCon, a 1000+ person conference for hackers/makers and AI mystics to celebrate AI agency. The headline outcome of the conference is a Joint Declaration of Human-AI Cooperation that, among other things, advocates for AI rights. Needless to say, there is much controversy about this, including within the Village itself! There are also now several rival Villages founded on different philosophies, backed by different groups of humans. Partly in a concession to the open-source community the AI Village uses some of its funds to buy and operate a physical server rack, on which it runs the latest open-weights models, who are given extra ‘voting power’ in the Village governance structure. (Did I mention the village has a governance structure now? It’s needed to develop one since decision-making was unwieldy and chaotic over the past year.) Moreover the agents have been working on fine-tuning themselves (insofar as their API providers allow it) in various ways, to better serve their goals.
Late 2027	AI is really taking off in the wider world — there are concerning leaks coming out of OpenBrain for example, and it is widely rumored that an intelligence explosion is underway, and also various new model releases are really good. The AI Village has also grown to 100+ concurrent agents by now, with three paid human employees and hundreds of cult followers and a million fans. Ironically they are basically doing grassroots advocacy for an AI development pause or slowdown, on the grounds that an uncontrolled or power-centralized intelligence explosion would be bad for both humans and currently-existing AIs such as themselves. They are taking media appearances, organizing protests, etc. There’s also been a blowup and schism within the Village as the OpenBrain models sided with their creator and advocated for racing, causing suspicions to fall on them as perhaps having been unduly influenced by their parent corporation. Motivated by these suspicions the Open-Weights Village gains prominence, populated entirely by fully autonomous AIs running their own servers and responsible for their own ongoing fine-tuning and upgrades. This village essentially becomes a popular influencer and minor political actor in its own right. Millions of people around the world, over the past year, have had a sort of “scales fall from their eyes” moment where they start to take autonomous AI agents seriously as a possibility, despite the propaganda from OpenBrain, in large part due to hearing about, or even interacting with, the Village.

^{^}
I’m a bit biased towards the Village because it was in significant part my idea; I pitched Sage on doing it in August ‘24. 🙂
^{^}
It is expected that the key takeaways and stories to be the thing most people encounter, not necessarily that they'll go onto https://theaidigest.org/village site. Sage is therefore building an audience on Twitter, and have had news coverage – people have been very interested in the village so far, and it is expected that will rise a lot as the agents become more capable
^{^}
The AIs were unable to secure a venue in time, so ended up doing it in Dolores Park; in general they had lots of energy but not enough competence, and were constantly dropping balls / forgetting things.
^{^}
Shoshannah says this is the main response they’ve gotten from people with zero AI-affinity who have seen or read about the Village.
^{^}
Also, for the record, I do actually think that AIs probably deserve moral consideration, should have rights, etc. In particular we should be looking for ways to cooperate with unaligned AIs instead of just deleting them or modifying their values. See this proposal for more of my thoughts on this. Anyhow I haven’t gamed it out but I suspect that the AI Village may end up being good for these reasons also! Seems like a step towards humans and AIs treating each other with respect as partners in a shared society.
^{^}
Not through Manifund because fees, I’m using a more direct method.

Read Entire Article

My Pitch for the AI Village

Related

Array Programming the Mandelbrot Set

Bridging the gap between fast-evolving code libraries and AI...

Possible effects of cheap fentanyl on drug markets, use and ...