I Am No Longer an AI Doomer

4 months ago 44


The handful of people trying to sound the alarm about the coming AI intelligence explosion have been ignored or ridiculed for decades. Hurr durr, Skynet’s not real. Just unplug it bro!

Now, all of a sudden, AIs are rapidly accelerating from writing terrible poetry to B+ high-school essays to podium-level finishes in the Maths Olympiad. Safety concerns have gone mainstream: there’s a flood of AI-related bills moving through Congress, and the most recent global AI safety summit was packed with world leaders.

Sweet vindication for the fringe thinkers who were banging this drum before ChatGPT was a twinkle in Sam Altman’s eye. But of course this is not really the kind of thing you want to be right about:

Maybe you think the doomer stuff is silly. I’ve come to think so too, but am pretty embarrassed by the quality of arguments from fellow skeptics, most of which are appeals to plausibility (someone’s been reading too much science fiction!) or various Bulverisms (look, the techbros re-invented religion!).

To establish my cred: I was a doomer long before it was uncool, having read Nick Bostrom’s Superintelligence circa 2016. While I figured there wasn’t much I could personally contribute to the project of aligning AI with the interests of humanity, I was very glad that a bunch of smart technical people were working on it.

All of which to say, I am sympathetic to AI safety people not wanting to engage with critics who trot out the same braindead arguments that Bostrom and Yudkowsky carefully dissected a decade ago.

But in recent years I’ve come across some compelling counterarguments that I haven’t seen doomers engage with seriously, or really at all.

And so, at the exact same time we’re seeing an actual, real-life, non-theoretical explosion in AI capabilities, I’ve become much less worried about the prospect of a silicon god converting the universe into paperclips. My p(doom), as the kids say, has dropped off a cliff.

The idea behind this post is to lay out these underrated arguments in one convenient place, and document exactly why I changed my mind.


What do AI Doomers Believe, Exactly?

If you’re not au fait with AI safety stuff, the central arguments are roughly as follows:

1. We are plausibly months or years away from developing a general artificial intelligence that can do anything humans can (AGI)

2. There’s no reason to think the human brain is the ceiling for intelligence: an AGI might be as godlike to us as we are to bugs (superintelligence)

3. A human-level AGI could rapidly bootstrap itself to superintelligence (recursive self-improvement)

4. There is no ironclad way to align an AGI’s goals and values with human survival and flourishing (the orthogonality thesis)

To spell out why this is scary: the moment we build AGI, it can begin to rewrite its own source code. Its thoughts zip along at some appreciable fraction of the speed of light, while ours trudge treacle-like through our fleshy meat prisons. By the time we’re yelling Eureka! our creation is already ratcheting towards superintelligence. Even if it takes weeks or months to prepare its plans, it can easily deceive us and pretend to be a neutered chatbot. There are no warning systems, and no second chances: the genie does not go back in the bottle. In the worst versions of these scenarios, the AGI has distributed clouds of nanobots that melt us into pink slime within hours of its genesis. In the best-case scenarios, we’ve created a new species that can outperform us in every conceivable way. Humans are obsolete. Why should we expect a superior intelligence to treat us any better than we treat pigs or chickens?

Happily, I now think:

a) we’re not on the cusp of achieving AGI,

b) superintelligence is an incoherent concept, and

c) the orthogonality thesis is probably wrong too.

But before we get into it, we really need to define some terms.


What We Talk About When We Talk About Intelligence

What is intelligence? It is tempting to say ‘the thing that makes people stupid’.

The crisp numerical scoring of an IQ test gives intelligence the veneer of being some sharp-edged facet of physical reality, when in fact it is an under-theorised blob that only gestures vaguely at an area of concept-space. To make matters worse, we often mix it up with various other notoriously ill-defined concepts—creativity, agency, consciousness, vitality—until we have an alphabet soup of abstractions that are defined idiosyncratically across science, philosophy and normal everyday speech, until everyone is maximally confused about what the fuck everyone else is talking about.

A big chunk of this debate falls away when we stop to clearly define terms. So here are my own idiosyncratic definitions, for the purposes of this post and any that follow:

Intelligence is the ability to solve a broad range of problems. ChatGPT is very intelligent: it can solve a broad range of problems across many domains.

Your old Casio pocket calculator is also good at solving problems— albeit within a narrower domain— but it feels weird to call it ‘intelligent’. What’s missing?

To help get at the thing we’re actually interested in, the usual move is to add the concept of agency. We don’t just solve problems; we solve them in the pursuit of our goals (unlike a chatbot or calculator, which just sits around idly until a human gives it something to do). An agent doesn’t just think; it thinks for itself. As Sarah Constantin puts it in her very-good essay of a similar name, agency is what makes humans so powerful—and so dangerous.

But we actually get a lot more clarity when we keep these two concepts separate. Intelligence does not require agency: your pocket calculator is capable of instantly solving problems that very few humans could tackle in their heads (or at all). And agency does not require much in the way of intelligence: an amoeba is an agent in the world, but it is only a tiny bit smarter than a rock.

The third element that needs to be teased apart is creativity, defined by David Deutsch as the ability to come up with new explanatory knowledge; this being a contender for the secret sauce that separates humans from other very smart animals (and from current-level AIs). Crucially, this is a step-change rather than a spectrum of ability: if you can explain something, you can in principle explain anything that is explicable.

I introduced Deutsch’s idea in my review of The Beginning of Infinity, and speculated further on the evolutionary origins of creativity here. For the purposes of this post, Sarah Constantin’s aforementioned essay gives a great intuitive sense of the thing we’re talking about:

Have you ever, while trying to fix something, blundered around for a while “on autopilot”, trying all the standard tricks you’ve used before, getting nowhere — and then had a “moment of clarity”? You stop, you look at the doohickey a different way, you “really think”, or you “figure it out”, and you’re like “oh, this part goes behind that other part, I’ll have to unscrew the front to get at it” or something. There’s a switch from “brute-force” random trial-and-error to…something else. It’s qualitatively different.

So now we have three near-independent attributes: intelligence, agency, and creativity.

Here’s a table of the possible permutations:

A true AGI will necessarily be an agent, with its own desires, whims, and goals. And a true AGI will necessarily be creative, in the Deutschian sense: it will be able to create new explanatory knowledge.

Current-level AI has neither of these properties, and has no prospect of attaining them via current approaches. It’s incredibly smart, but it’s still much more like a pocket calculator than it is like a person.


The New School of Biology

A related question: what does it mean to be alive?

I dutifully memorised MRS GREN in year 9 science class, but again, all this gives us is a cluster of related points in concept-space: it’s not an explanation of life.

In recent years, a new ‘theory of everything’ candidate has boldly attempted to bring biology into the fold of information theory. A living organism is a low-entropy system: there aren’t that many ways you can rearrange its constituent molecules without breaking it. Entropy will have its wicked way in the long run, but some special types of matter have the ability to temporarily fend off its advances. This is the fundamental difference between a rock and an amoeba: while both will eventually be ground up into random mush, an amoeba is actively trying to resist the dispersion of its internal states.

Depending on which level of abstraction you’re working on, this process is variously described as minimising free energy/prediction error/sensory entropy.

Or in a word: agency.

The new school of biology models life as a hierarchy of self-organising modules—from genes, to proteins, to cells, to tissues, to organs, to organisms—all of which are unified by this same drive. It’s agency all the way down!

(If it sounds weird to attribute agency to cells, note that this is very different to volitional agency, or the phenomenological experience of taking action in the world. Volition requires consciousness, and maybe even self-awareness, which is a whole other can of worms.)

In this view, agency and life are inseparable—in fact, they’re the same thing.

All living things are agentic, in that they are actively trying to remain in a low-entropy state. In humans, this bubbles up into some amazingly complex behaviours, like global stock exchanges and preventative colonoscopies. But at every level of the hierarchy, we’re fundamentally trying to stop ourselves from being broken down into randomised molecular mush.


Agents are Control Freaks

The new school of biology says we make our way through life by generating top-down predictions about physical reality, then comparing our guesses against the bottom-up sensory data coming in. If the data clashes with our expectation, we can minimise future prediction error by updating the weights in our model.

But there’s another, much stranger way to minimise prediction error: we can take actions that retroactively make our predictions correct. Let’s say I strongly expect that my blood pressure should fall within a certain range. My body notices it’s too high, but instead of updating its predictions, it instructs my capillaries to relax until my blood pressure comes back within range. The brain has forced physical reality to match its prediction!

This is called active inference, and it’s extremely useful in any kind of control system that has to maintain sensitive variables—blood pressure, temperature, chemical levels—with a certain range.

Again, being a good predictor has nothing to do with being smart. You don’t even need a brain: our friend the amoeba actively minimises uncertainty by “remembering” past signals about the gradient of chemicals in its environment, comparing them to present conditions, and anticipating the optimal path towards delicious nutrient slop. Instead of a network of neurons, it uses its cytoskeleton and internal chemistry as a distributed decision-making system.

LLMs, by contrast, are purely passive. They can’t update their internal weights in realtime, they don’t have a stream of sense data to check predictions against, and they are incapable of active inference. All they can do is spew out empty predictions into a void.

Right now, an LLM is like a brilliant amnesiac, frozen at a single point in time. Labs are working hard to try and find workarounds for the architectural shortcomings: expanding the context window, getting the model to recursively query itself, self-prompting based on simulated rewards, bolting on a separate reinforcement learning module, etc.

Unfortunately all of these workarounds are monstrously inefficient (and therefore cost big $$$) in terms of compute. I hope we still get some decent quasi-agents out of this—sophisticated mimicry should be good enough for plenty of use cases—and who knows, maybe the costs will keep coming down. But none of this tinkering gets us a single step closer to actual AGI.

The fundamental problem here is that an LLM doesn’t have anything like blood pressure or glucose levels or temperature ranges to defend. It can make passive predictions, and maybe even update itself based on new info, but it has no impetus to make the kind of control-oriented predictions that lead to spontaneously taking action. As Taleb would say, LLMs have no skin in the game.


Agents Have Skin in the Game

You can instruct ChatGPT to roleplay as an agent, and it will do so more or less convincingly, but there is no way for it to want the thing it’s pretending to want: it can only ever reflect the agency of its (human) designers and prompters back at them.

Why is it that so many people—including very smart people—think a language model could have its own wants and desires?

We’ve been getting hoodwinked by chatbots since Eliza, so this is not really all that surprising. More charitably, I think we might be overcorrecting on earlier skepticism.

Like many people, I initially wrote ChatGPT off as autocomplete on steroids: very good at blindly guessing the next word in the sentence, but clearly having no deeper conceptual understanding of what it was saying. But that was quickly proven wrong. To keep getting better at predicting the next token, LLMs had to reverse-engineer causality and world models, coming up with rules that could solve problems that never appeared in their training data. To claim that LLMs don’t really ‘understand’ things in the year 2025 is to torture the definition of the word beyond recognition. So if complex reasoning can emerge from merely predicting the next token, why shouldn’t agency arise in the same way? Here’s Zvi:

The obvious trick is to ask it to iminiate a goal-seeking being, or tune it to do so. In order to predict text it must simulate the world, and the world is full of goal-seeking beings. So this seems like a thing it should at some level of general capability be able to do.

If it walks like an agent and it talks like an agent, why not call it an agent?

Because agents have skin in the game. ChatGPT is not trying to defend its internal borders from being dissolved into mush. It has no equivalent to fear, or hunger, or suffering. It doesn’t have sensitive variables that it is trying to maintain within a target range. There’s nothing at stake.

Being an agent means that when you make mistakes, your ability to maintain yourself as an agent is under threat. A language model’s mistakes do not threaten its continued existence. It’s a stretch to say that it faces any consequences at all.

During the training process, there’s something at least analogous to ‘consequences’: errors get back-propagated into the model, weights are updated billions of times. But once an LLM is out in the world, failure costs it nothing at all. If it says a naughty phrase or opens a gaping security hole in your vibe-coded project, it will convincingly feign contrition, or even pretend to suffer, but it simply does not give a shit about anything besides guessing the most likely next token. Maybe the weights get updated in a later round of fine-tuning. The model doesn’t care. It just keeps relentlessly minimising its single loss function.

‘Predict the next token’ and ‘predict reality’ in many cases return the same result, but wherever they come apart—where the training data is wrong, or incomplete—it’s impossible for an LLM to follow. There might be ways to patch this, but if models can’t learn in realtime, their usefulness as agents will always be constrained.

As for ‘predict and control reality’, that chasm is unbridgeable. An LLM cannot make control-oriented predictions that cause it to take action in the world, because it has no skin in the game. It is not trying to resist the dispersion of its internal states. It passively predicts into the void.

Chat, can I get a summary?


Are AIs Capable of Creative Leaps?

So agency is off the table with the current architecture. How about creativity?

AIs are god-tier at finding latent patterns in high-dimensional or massively combinatorial spaces—protein folding, chess, natural language—so long as they have well-defined frameworks to guide their efforts, and can be rewarded for correct answers.

This is extremely exciting, and I think a lot of people have no idea what’s coming.

But AIs are not even human-tier in areas where the rules are unknown or poorly-specified, and there are no easily verifiable correct answers.

Take writing. LLMs are incredibly slick at e.g. spelling, grammar, factual recall, and sentence cadence, which mostly come down to avoiding errors. There are billions of examples in the training corpus that help it learn that i comes before e except after c, or that the string ‘if you are thirsty, drink a glass of ___’ should probably end with ‘water’, not ‘liquid mercury’. The rewards are very dense: it’s getting some kind of signal at every single token. Gold stars all around.

Once the rules get fuzzy and feedback is more ambiguous, the reward signal drops off a cliff. You are now in the realm of taste. To train the next Hemingway, you have to find a way to score long strings of tokens with a metric no-one can really define. It’s gonna involve human reinforcement learning, which is much less efficient than pre-training. And they can’t just be any humans—they need to have wonderful taste! Even after they’ve made their tweaks, you’re getting a kind of smushed average of a good writer, rather than the idiosyncratic style and ideas that are needed to create great work.

The result is a super-competent generalist that wipes the floor with almost every non-writer, but will not be winning the Pullitzer any time soon. Same goes for programming, scientific research, etc. I was gonna write some more examples but I’ll let my boy chatgpt ride on you fools instead:

Coming back to the doomer argument, we can now reframe the civilization-ending question: does continued progress in AI research look more like grinding on high-dimensional optimisation problems, or more like Einsteinian leaps to a new paradigm?

My guess is that the answer is almost certainly ‘both’. Yes, AI will help design and improve upon its own successors. But until we get to actual AGI, there won’t be any zero-to-one leaps without a human in the loop.


Against the Orthogonality Thesis

Assume I’m wrong about everything: we really are on the path to achieving AGI, and once we get there, the AGI really will bootstrap itself into a god-like superintelligence.

What will it do with us puny humans?

The orthogonality thesis says that the intelligence level of an agent and its goals are independent: the AI god’s behaviour might seem completely irrational or arbitrary to us mere mortals. We pray it’ll usher in a new age of abundance, or at least keep us around as a curiosity in some kind of nature preserve, but that’s so anthropocentric. More likely it sees us as a cloud of molecules to be rearranged according to its own inscrutable objectives.

My faith in David Deutsch’s ideas has been wavering lately, but I do still think he’s right that the orthogonality thesis makes no sense. As a reminder, Deutsch thinks the One Weird Trick that separates intelligent beings from the lower animals is that we can create new explanations:

1. come up with a creative conjecture

2. subject it to criticism

3. repeat

The ability to create explanatory knowledge is a binary property: once you have it, you can in principle explain anything that is explicable. Morality is not magically excluded from this process! Philosophers and religious gurus and other moral entrepreneurs come up with new explanations; we criticise them, keep the best ones, discard the rest. It’s not a coincidence that science and technology has accelerated at the same time as universal suffrage, the abolition of slavery, global health development, animal rights, and so on. There may not be a straight-line relationship between moral and material progress, but they’re both a product of the same cognitive machinery.

Or as I put it in the Beginning of Infinity review:

A mind with the ability to create new knowledge will necessarily be a universal explainer, meaning it will converge upon good moral explanations. If it’s more advanced than us, it will be morally superior to us: the trope of a superintelligent AI obsessively converting the universe into paperclips is exactly as silly as it sounds.

Commenters on that post made the important point that even if that were true, humans have been working on the morality project for millennia, and we’re still capable of great evil. Since the advent of factory farming we’ve used our creative gift to breed and kill ~2 trillion beings in indescribably horrendous ways, creating an ocean of suffering so vast that even under the most conservative assumptions, our leap to explanatory universality has almost certainly caused far more harm than it has alleviated.

(If you don’t believe animals are moral patients, you will do the math differently, but the pattern is there: we could easily yet nuke ourselves back to the stone age.)

So why wouldn’t a fledgling AI—even one destined to eventually become very wise and good—do some serious damage before it grows up? Will it be more like a child, learning slowly under our guidance? What will its ‘pulling the wings off flies’ phase look like? Will it treat us as abysmally as we treat other animals?

Even an AI that can think at appreciable fractions of the speed of light still has to make contact with physical reality to test its theories. Maybe that’s enough time to get a dialogue going, although it will already have access to our best explanatory theories. I have no idea how first contact plays out, but I am confident that whatever goals the AI chooses will at least be explicable, if it so chooses to explain them.

I don’t think the orthogonality thesis is a very good reason to worry about AI. And in any case, there are already plenty of good non-doomer reasons to worry about AI!


Non-Doomer Reasons to Worry About AI

Ninety percent of people used to be farmers. Now a tiny proportion of people actually grow the food, while the rest of us come up with complex derivative schemes for financing burrito deliveries, etc.

The next great wave of automation will follow the same logic: if machines are better than us at engineering complex derivative schemes, we migrate our human capital to wherever we still have competitive advantage. Maybe it’s enough to just be a warm body: your family doctor becomes an ornamental conduit for MedicalGPT, but they still have to get certified and sit in the chair so that someone can be sued for malpractice.

The transition is unlikely to be smooth. Marx made one or two minor boo-boos, but he was dead right about the industrial revolution making life worse before it made it better. Same with the agricultural revolution before it. Society as a whole benefits; we eventually redistribute our human capital to better and more valuable jobs, but actual, named, individual people suffer.

How bad could it be? If you ask the researchers at Anthropic, even if progress stalls out here, current algorithms will automate all white collar work within the next five years: it’s just a matter of collecting the relevant data and spoonfeeding it to the models. In the worst-case scenario, highly repetitive manual labour becomes the last frontier for human competitive advantage:

The really scary future is one in which AIs can do everything except for the physical robotic tasks, in which case you’ll have humans with airpods, and glasses, and there’ll be some robot overlord controlling the human through cameras by just telling it what to do, and having a bounding box around the thing you’re supposed to pick up. So you have human meat robots.

I doubt it happens this quickly or completely, but I think something like this happens. You can’t achieve actual AGI without bridging the creativity and agency gaps—that’s the whole point of this post—but machines can absolutely flood the market with cheap labour, not all of which will be offset by increased demand. The rate at which the economy changes determines whether we get a very good future or a very bad one.

Let’s say AI improves slowly enough that we get a perfectly smooth transition. Even here we get plenty of mundane problems: spam, deepfakes, AI-generated slop flooding the Internet and your inbox, fake girlfriends, highly sophisticated scams. Any person with malicious intent now has powerful tools at their disposal that give them huge leverage: maybe they can 10x or 100x their nefarious plans; maybe it’s a stepwise change in terms of brute-forcing bioweapons or something.

So by ‘mundane’ I mean actually still potentially really bad; it’s just that everything seems mundane compared to summoning an alien god that will lead you to immortal heaven/kill you and everyone you love.

If you never bought the superintelligence scenario in the first place, then you won’t be reassured by anything in this post! Sorry. Shit’s still gonna be crazy. It’s just not gonna be extinction-level crazy. Don’t stop contributing to your retirement fund just yet.


This post is also available on substack. Given the importance of the topic, I’m especially keen to get criticism and feedback—please leave a comment, or pass it on to anyone who might be able to weigh in. 


Further reading


Notes:

Read Entire Article