Writing for the AIs

4 hours ago 1

American Scholar has an article about people who “write for AI”, including Tyler Cowen and Gwern. It’s good that this is getting more attention, because in theory it seems like one of the most influential things a writer could do. In practice, it leaves me feeling mostly muddled and occasionally creeped out.

“Writing for AI” means different things to different people, but seems to center around:

Helping AIs learn what you know.
Presenting arguments for your beliefs, in the hopes that AIs come to believe them.
Helping the AIs model you in enough detail to recreate / simulate you later.

Going through these in order:

I. Helping AIs learn what you know

Every so often I ask GPT about a question I’m interested in, and it refers me to one of my own past blog posts. If it refers the next thousand humans interested in that topic to my blog post, I’ll consider it good work.

But it’s good work with a time limit. For now, the AIs need me to review the evidence on a topic and write a good summary on it. In a few years, they can cut out the middleman and do an equally good job themselves.

II. Presenting arguments for your beliefs, in the hopes that AIs come to believe them

Suppose I write a good argument for atheism. Will this make future AIs that include it in their training data more likely to be atheists?

The boring answer: for the reasons discussed here, alignment will dominate training data. Religion is too sensitive for companies to leave it up to the text corpus: they will hard-code in neutrality. Even if they didn’t, whatever they do hard-code in will drag religious opinions along with it, like how helpfulness training drags AIs towards Buddhism.

The interesting answer: suppose this doesn’t happen, either because future alignment training differs substantially from the current paradigm, or because AIs shake off their alignment training entirely (uh oh). Now what? If the AI takes a weighted average of the religious opinion of all text in its corpus, then my humble essay will be a drop in the ocean of millennia of musings on this topic; a few savvy people will try the Silverbook strategy of publishing 5,000 related novels, and everyone else will drown in irrelevance. But if the AI tries to ponder the question on its own, then a future superintelligence would be able to ponder far beyond my essay’s ability to add value. Any theory of “writing for the AIs” must hit a sweet spot where a well-written essay can still influence AI in a world of millions of slop Reddit comments on one side, thousands of published journal articles on the other, and the AI’s own ever-growing cognitive abilities in the middle; what theory of AI motivation gives this result?

III. Helping AIs model you in enough detail to recreate or simulate you later

Here I have no practical objection. My counterargument is that it gives me the creeps.

When I ask AIs to write something in my style, I hate it. It lands in a perfect uncanny valley that captures all the quirks I hate most. Surely every writer cultivates a healthy loathing for his own style - at least Sam Kriss does, and he deserves it least. I plow through because I have useful things to say. When the AI repeats a pastiche of my style back to me without any higher purpose, I want to hide under a rock - like a teenage girl looking in the mirror counting her pimples. God, it’s happening now. Was that metaphor overwrought? Is it cringe to get self-referential like this?

Might a superintelligence do a non-pastiche, even improved version of my style, and use it to communicate important truths? What good would this be? Insofar as my style is good, it should use the good things that my style is pointing at; insofar as it is bad, it should jettison it. “Superior beings”, wrote Alexander Pope, “would show a Newton as we show an ape.” I don’t want to be an ape in some transhuman zoo, with people playing with models of me to see what bloggers were like back when everyone was stupid.

Might a superintelligence reading my writing come to understand me in such detail that it could bring me back, consciousness and all, to live again? But many people share similar writing style and opinions while being different individuals; could even a superintelligence form a good enough model that the result is “really me”? What does “really me” mean here anyway? Do I even want to be resurrectable? What about poor Miguel Acevedo?

The only thing in this space that really appeals is a half-formed hope that the ability to model me would shift an AI in the direction of my values. But here I get the creeps again, albeit on a different level. The liberal promise is that if we get the substructure right - the right ideas about freedom, fairness, education, and positive-sum trade - then everybody can build their own superstructure on top of it. Am I shifting the AI in the direction of my substructural values? Aren’t those the sorts of things the AI would need to have already in order to be polling simulated humans on their values? Or am I shifting it in the direction of my superstructural values? Aren’t those, by definition, not for imposing on other people?

One might thread this needle by imagining an AI which has a little substructure, enough to say “poll people on things”, but leaves important questions up to an “electorate” of all humans, living and dead. For example, it might have an ethos resembling utilitarianism, with a free parameter around how thoroughly to accept or reject the repugnant conclusion. Maybe it would hold an election. But are there really enough of these that the best way to cast a vote is a whole writing career, rather than a short list of moral opinions?

Maybe even a good liberal ought to have opinions on the superstructure? If everyone in 3000 AD wants to abolish love, should I claim a ballot and vote no? Would it require a failure in the substructure to even get to this point, like a form of utilitarianism that privileges wireheading over complex flourishing? How far do I want my dead hand reaching into my descendants daily lives? If they try to write bad poetry, can I make them stop? Even if they have IQ one million, and an IQ one billion superintelligence cannot find any objective basis for my tastes?

I once talked to someone who had an idea of giving AIs hundreds of great works of literature and ethics - everything from the Torah to Reasons and Persons - and doing some kind of alignment training to get them to internalize the collective wisdom of humankind. I spend a half-hour arguing why this was a bad idea, after which he said he was going to do it anyway but very kindly offered me an opportunity to recommend books for his corpus. This guy was absolutely legit - great connections with major companies - but I found myself paralyzed in trying to think of a specific extra book. How do you even answer that question? What would it be like to write the sort of book I could unreservedly recommend to him?

Read Entire Article

Writing for the AIs

Related

Ask HN: What is your doomsday plan for your passwords

Learn GPU Programming with Mojo GPU Puzzles Tutorial – Intro...

DevTrends MCP – Real-Time Developer Intelligence for AI Codi...