Today Yudkowsky & Soares published their book If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All. I spent the day reading it.
Their core arguments (my paraphrase):
Knowing that a mind was evolved by natural selection, or by training on data, tells you little about what it will want outside of that selection or training context. For example, it would have been very hard to predict that humans would like ice cream, sucralose, or sex with contraception. Or that peacocks would like giant colorful tails. Analogously, training an AI doesn’t let you predict what it will want long after it is trained. Thus we can’t predict what the AIs we start today will want later when they are far more powerful, and able to kill us. To achieve most of the things they could want, they will kill us. QED.
Also, minds states that feel happy and joyous, or embody value in any way, are quite rare, and so quite unlikely to result from any given selection or training process. Thus future AIs will embody little value.
These arguments seem to me to prove way too much, as their structure applies to any changed descendants, not just AIs: any descendants who change from how we are today due to something like training or natural selection won’t be happy or joyous, or embody value, and they’ll kill any other creatures less powerful than they.
Let us break future creatures who change due to selection or training into any two categories of a small us vs a big them. As we can’t predict what they will want later, and they will be much bigger than us later, then we can predict that they will kill us later. Thus we must prevent any changed big future they from existing. Except, as neither us nor they are happy or joyous later, who cares?
Some I’ve talked to accept my summary above, but say that the difference with AI is that it might change faster than would other descendants. But culture-mediated non-AI value change should be pretty fast, and I’m not sure why I should care about clock time, relative to the rates of events experienced by key creatures. Others say that humans are just much less pliable in their desires than are AIs, but I see much less difference there; human culture makes us quite pliable.
We can reasonably doubt three strong claims above:
That subjective joy and happiness are very rare. Seem likely to be common to me.
That one can predict nothing at all from prior selection or training experience.
That all influence must happen early, after which all influence is lost. There might instead be a long period of reacting to and rewarding varying behavior.
Some relevant quotes:
AI companies won’t get what they trained for. They’ll get AIs that want weird and surprising stuff instead. …
The link between what the AI was trained for and what it ends up caring about would be complicated, unpredictable to engineers in advance, and possibly not predictable in principle. …
The link between what a creature is trained to do and what it winds up doing can get pretty twisted and complex, …
But the stuff that AIs really want, that they’d invent if they could? That’ll be weird and surprising, and will bear little resemblance to anything nice. …
There will not be a simple, predictable relationship between what the programmers and AI executives fondly imagine that they are commanding and ordaining, and (1) what an AI actually gets trained to do, and (2) which exact motivations and preferences develop inside the AI, and (3) how the AI later fulfills those preferences once it has more power and ability. …
The preferences that wind up in a mature AI are complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained. …
it may act subservient while it’s young and dumb, but nobody has any idea how to avoid the eventuality of that AI inventing its own sucralose version of subservience if it ever gained the power to do so. …
Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. …
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. …
We predict the result will be an alien mechanical mind with internal psychology almost absolutely different from anything that humans evolved and then further developed by way of culture. …
Making a future full of flourishing people is not the best, most efficient way to fulfill strange alien purposes. …
It’s easy to imagine that the AI will live a happy and joyous life once we’re gone; that it will marvel at the beauty of the universe and laugh at the humor of it all. But we don’t think it will, any more than it will make sure that all its dwellings contain a “correct” number of stones. We think a mechanical mind could feel joy, that it could marvel at the beauty of the universe, if we carefully crafted it to have that ability. …
The endpoint of modern AI development is the creation of a machine superintelligence with strange and alien preferences.
Added 17Sep: I suspect Yudkowsky & Soares see non-AI-descendant value change as minor or unimportant, perhaps due to seeing culture as minor relative to DNA.