Generative AI Is the Marshmallow Test

3 months ago 2

Long-time readers know that I have some misgivings about social psychology, almost all of which stem from its mode of reductionism in studies that involve laboratory experiments. Rather famously at this point, many of the important studies conducted in this fashion have proven difficult to replicate. Either the original study size was too small or too particular, the research design was too skewed towards producing the desired outcome, the interpretation of the data was selective, or various controls weren’t applied to the study outcomes and the significance of the findings was overstated.

My objections to some of these studies are less systematic. What I really dislike is the leap between something interesting seen in a lab and a generalization of what was seen in the lab used to massively reshape the way common social institutions operate. Most typically this involves placing extraordinary causal weight on a single variable or observed tendency, usually as a way of producing a manageable action available to a single institution while absolving it of responsibility for complexity that arises from a huge range of underlying and linked causes. I don’t care what a laboratory study showed, there is always something wrong with this “one simple trick!” application of an academic finding. Moreover, a fair number of social psychologists tend to forget that if their lab study has actually identified something that has that kind of causal significance, that kind of mapping of a variable to an outcome, they have just offered a prediction about overall social outcomes. Less or more of the variable should be less or more of the outcome at an observable level over time. It rarely turns out that way, and when it doesn’t, researchers quietly fold their tents and move on, saying that the data we have on the observable outcome is flawed or that it turns out there is another variable that needs measuring. In the meantime, institutions are left with policies and procedures that were adopted as a response to those findings that either do nothing measurable or sometimes perversely seem to make outcomes even worse, perhaps in part because the institution is now forgiven of the need to pay attention to actually existing causal complexity.

Given my attitude, you might be surprised to find that nevertheless I really value academic psychology. For one, it offers real empirical data on mind, selfhood, and behavior that seriously trouble forms of methodological individualism in economics and political science, which is a real service, and both disciplines have increasingly had to respond to that challenge. More importantly, I find that even when a particular experimental study turns out on replication to be less significant and explanatory than it appeared to be, social psychology nevertheless supplies a rich vocabulary of metaphorical framings that help us talk about human behavior in modern societies, that let us bring questions of mind, cognition, motivation and selfhood into view. In this much, I too am a reductionist: if we had to talk about the actions of every individual in terms that were 100% nominalist, that tried to match what a person visibly did to that person’s particular, peculiar interiority with all of its details and complexities in tow, we could not ever talk about patterns, trends, alignments in society. We could not talk about groups, about institutions, about collective action. We would live in a fog of presentist mystery and know only what we individually experience directly.

So sometimes I don’t care if a psychological study is replicable in the strict sense: it provides me a narrative framework, a metaphor-rich language, a way to interpret. It gives me a hook to hang my hat on.

Case in point: the “marshmallow test” that purported to demonstrate that children who learned how to delay gratification were better off in later life in a host of ways, which was then used by other interpreters to explain general inequality (the impulsive self-gratifiers trend towards poverty, the shrewd delayers show they understand how to accumulate wealth). That level of interpretation, unsurprisingly, has been dialed way back by subsequent studies. It is, like a lot of these sorts of one-variable-explains-everything claims, just a refashioning of folkloric wisdom (Aesop’s ant and grasshopper) but also the specific Weberian belief that capitalism’s foundation was the long-termism that lay inside the Protestant work-ethic, that mapped poverty and wealth to a “theodicy of fortune”—that if you are poor you must have done something wrong.

Disconnect the study from those kinds of claims and those kinds of absolutions of capitalism and modernity, though, and the basic narrative scenario of those experiments has some resonance. Most of us can relate to it emotionally. We can all think of times that we’ve taken the easy route and regretted it, that we’ve jumped at someone offering a simple solution to an otherwise long and painful process only to find that we end up doing the long and painful thing anyway and now it’s longer and more painful. Many of us enjoy a feeling of schadenfreude when we see someone else grab a proffered marshmallow that we were smart enough to refuse (and are enraged when the instant gratifier miraculously escapes with a better deal nevertheless).

Just speaking in this more metaphorical sense, it’s pretty plain generative AI is one of the biggest marshmallow tests in human history, and a lot of people are failing it. Rather as in the case of the experiment itself, it’s hard to say whether the moral fault for that is on the side of the people offering easy marshmallows while knowing that there’s a nest egg of marshmallows waiting for those who say no, or on the part of people incapable of resisting, people who walk right into what seems like an obvious trap, perhaps secure in the thought that everybody’s doing it, so why not.

Think about all the stories that have made the press, most of which seem like they have got to be the tip of the iceberg of actual practice. Judges using AI for their rulings and lawyers using AI in their briefs only to be exposed when it turns out that the precedents they cite don’t exist. Novelists using AI to write an entire formulaic genre work only to get noticed when they leave the prompts and AI responses in the text. Students getting caught when their bibliographies contain references that don’t exist. Scholars getting caught when it turns out a manuscript being peer-reviewed had tiny invisible text instructing AI peer reviewers to be ridiculously generous to the publication under review. Government officials confidently proclaiming that they’ve asked AI to undertake a sensitive review of government documents that the AI itself doesn’t (or shouldn’t) have access to. Journalists citing “facts” that turn out to be AI hallucinations like Woodrow Wilson’s non-existent pardon of his nonexistent in-law Hunter D. Butts.

A lot of people are stuffing their gobs with easy marshmallows. A lot of marshmallow pushers are offering single marshmallows to people unable to refuse the temptation because it makes their expensive and reckless investment in generative AI look like it’s going to pay off the same way that the first generation of platform companies did, or even more so. Neither the companies nor their addicts want to wait for better outcomes.

Is there in fact a great store of marshmallows coming to those who wait? Yes, I think so. This is the point I keep circling back to that many other commenters have noted as well, which is that if you do the work now that involves developing your expertise, creativity, and skills, you may find generative AI to be a modestly useful, highly targeted tool that can nestle into your workflow, the same way that word processors and spell checkers and calculators did. The question is partly whether the marshmallow pushers are going to crash and burn off of cheap deals on the street corners of global life because their users have broken too many important parts of our existing world via AI shortcutting. The useful AI we all could benefit from may not survive the bad judgment of the companies that could have helped to create it.

There’s another lesson that we could all learn as well, a more unsettling one than “most people are suckers who can’t resist a single marshmallow”. Maybe the incentives to resist the offer require really believing that the work you’re doing matters, that you have to do what you’re doing the right way or everybody will suffer the consequences. You need to feel that you are doing something meaningful in a world that is built around the meaningfulness of individual lives.

Instead, well, sometimes the grasshopper is smarter than the ants because he knows the coming winter is going to kill everything. Or because he overheard the farmer calling for an ant exterminator. Sometimes the ants are being a bunch of idiots because there’s plenty of food in the winter, everywhere and all the time. Sometimes the grasshopper knows that his entire life is about mating and dying in a single summer, so the only thing to do is enjoy it right now. Eat that fucking marshmallow.

Sometimes the cake is a lie. Sometimes the marshmallow stash is going to slip out the door with the experimenters. Sometimes the entire world is a bunch of miserable kids sitting on the other side of a one-sided mirror being watched by a group of sadistic observers who think the whole thing is totally hilarious.

Concretely, maybe what generative AI is proving is what David Graeber wrote about in Bullshit Jobs, what James Livingston wrote about in “Fuck Work”, that people are feeling more and more that what they do doesn’t matter. A recent scholarly response to Graeber argued that he was empirically wrong in that white-collar workers seemed to feel increasingly like their work mattered rather than less, though read carefully, the study also affirmed older findings that feelings about work are “episodic” and that many people do feel their work is meaningless, just not the people that Graeber thought felt that way. But maybe both Graeber and these critics are wrong in the sense that how people concretize their feelings in response to a survey or an interview and how they act in their work and towards their work provide different kinds of evidence, and we might have here a case of what economists call “revealed preference”. Revealed by the rapid spread of AI usage.

Or maybe 2025 and 2020 are differentiated first by the rise of absurdist incompetence to leadership in many private and public institutions in the United States and second by the rise of generative AI itself. The preference for single marshmallows, for using generative AI to shortcut tasks that are supposed to be the heart of many professions or responsibilities, is just showing us that for a lot of people, there’s no point to it all but that they didn’t know it until now.

If you’re a judge, why bother putting effort into your rulings if nobody cares whether your precedents are real or not? The Supreme Court of the United States of America is issuing some of its most consequential rulings without explanation now, and when its majority bothers with justifying its decisions, they apparently feel no shame about outright lying (Gorsuch in Kennedy vs. Bremerton School District), ignoring precedent, or deciding suddenly that 17th Century English common law is a meaningful constraint on the constitutional government of a nation created out of a revolution against England a century later.

If you’re a lawyer, why bother investing in a well-constructed brief that properly cites precedent if most legal proceedings are going to be resolved by who has the most money, who has prior contractual advantage to compel the other part into unfavorable arbitrations, or reshaped to fit the preferences of political authorities? If journalists and their editors aren’t ever held accountable for omitting crucial information, for distorted citations of evidence that have been chosen to fit a prior framing, or for significant fabrication, why be a chump and spend the effort on quality reportage? If there’s a flood of garbage research filling up the world’s academic journals and for-profit publishers are extracting all the profits from them while ignoring the basic labor involved in producing good scholarship and peer-reviewing the work of others, why not just handle your output with one little marshmallow of AI use at a time?

Who needs a real person to write the script for the seventh Sharknado, whenever it gets made? Who would notice the difference? Who needs a real person to write the script for the next Hallmark movie? If you don’t need people for that, well, a lot of studio executives are going to think that perhaps you don’t need them for anything.

When students accurately realize that faculty are disconcerted in some cases not because the quality of average undergraduate writing is less under AI but just that the average undergraduate is not having to suffer enough long nights in order to write those mediocre essays? No wonder a lot of the students hit the button and take the marshmallow. We don’t live in a world where the promise that if you do the boring writing now, you’ll be rewarded for having become a strong writer later seems even remotely credible. Who cares about doing a good job wirh writing and researching when the President and his entire Cabinet lie almost every time they say anything and communicate like they’re auditioning for a part in Idiocracy 2: My Balls Get More Ouchy?

Some of the people who are pushing generative AI the hardest are arguing that what it will do is handle all the routinized bullshit so that people will have the breathing space to really think, to write only what really matters, to rediscover a life of meaning. Among other things, this argument insidiously proposes that none of that work ever mattered the way we thought it mattered. This much I’m sure about: the probity and care put into judicial rulings did matter in much of the 20th Century. The professional craft of legal writing by lawyers did matter. Being a skilled communicator who was judicious in how or when you lied did matter. Doing scholarly work carefully and slowly was important and meaningful.

If little of it matters any more, that is a failing that I agree precedes generative AI, where what we should realize about our present epidemic of single-marshmallow consumption is that we stopped eating a more balanced meal a good while back. I am grateful that generative AI has provided the shock that we need to recognize that.

However, I don’t think generative AI is going to handle all of the bullshit jobs so that we automatically rediscover work with dignity and meaning. That reminds me a lot of the pitch that manufacturers used to offer to middle-class women in the mid-20th Century, that domestic appliances would handle all of the drudgery and liberate them into lives of leisure and pleasant sociality. Or more directly, of the broken promises that each successive wave of information technologies have offered to us: the efficiencies of word processors meaning less time spent on painful retyping of routine textuality, the efficiencies of email meaning less time spent in meetings figuring out the basics, and so on. These promises are always lies because the people who own our labor will always want some new pound of flesh from us. Increased productivity is not liberation but just another turn of the screw. If nothing else, most of us will find ourselves redeployed as the supervisors of AI doing the bullshit jobs (badly): we won’t be writing the briefs and the rulings, we’ll just have to fix the hallucinations.

Which some people will doubtless eat a marshmallow on, and tell an AI to fix the AI, since a “life of greater meaning” hardly seems to arise from “changing the diapers of an AI after it poops outputs that nobody really cares about anyway”.

No, if we’re going to learn to delay gratification, then we’ll have to attend to making what we do in life and work gratifying. That is not a job for AI, it’s a job for human beings, and I can’t help but feel that the window of opportunity is closing, because it will take older human beings who have some ability to imagine both what was meaningful and some younger human beings who can still summon the ability to imagine that life could be meaningful in ways it never has been before. If they can, if we can, then maybe we will discover that what could be waiting for us is not just an unlimited supply of marshmallows but something far more precious and satiating.

Image credit: By Smith609 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=5884334

Image credit: "BURN BABY BURN MARSHMALLOW INFERNO" by GIALIAT is licensed under CC BY 2.0.

Discussion about this post

Read Entire Article