tl; dr: We can better understand common objective functions (reward, prediction, fitness, control) as all being related to a singular, overarching objective.
Reward? Prediction? Fitness?
In their 2021 paper Reward is enough, DeepMind researchers argue that "intelligence, and its associated abilities, can be understood as subserving the maximization of reward."
This is a response not just to the idea that Attention Is All You Need, but also to predictive processing, a theoretical framework in neuroscience where prediction-error minimization is the star of the show, rather than reward maximization.
"The whole function of the brain is summed up in: error correction." So wrote W. Ross Ashby, the British psychiatrist and cyberneticist, some half a century ago. Computational neuroscience has come a very long way since then. There is now increasing reason to believe that Ashby's (admittedly somewhat vague) statement is correct, and that it captures something crucial about the way that spending metabolic money to build complex brains pays dividends in the search for adaptive success. In particular, one of the brain's key tricks, it now seems, is to implement dumb processes that correct a certain kind or error: error in the multi-layered prediction of input.
―Andy Clark, Whatever next? Predictive brains, situated agents, and the future of cognitive science (2013)
This battle between Reward and Prediction is an old one. The behaviorists were Team Reward. The cyberneticists were Team Prediction. Reward maximization is closely linked to the idea of fitness maximization and Darwinian evolution. Error minimization is closely linked to the overarching notion of control.
Could we possibly fit everything under the same umbrella?
Selectionism
Let's take a step back.
In 2023, a peculiar paper was published in PNAS: On the roles of function and selection in evolving systems.
We identity universal concepts of selection—static persistence, dynamic persistence, and novelty generation—that underpin function and drive systems to evolve through the exchange of information between the environment and the system. Accordingly, we propose a "law of increasing functional information": The functional information of a system will increase (i.e., the system will evolve) if many different configurations of the system undergo selection for one or more functions.
The universe started out in its lowest-entropy state, and is progressing toward its highest-entropy state. Along the way, however, are barriers. Free energy dissipation sometimes runs into "problems" (being stuck in a less-than-ideal state) which can be "solved" (finding a way to overcome energetic barriers).
Static persistence refers to physical configurations with long-term stability. The authors refer to these as "batteries of free energy" or "pockets of negentropy". This is first-order selection.
Dynamic persistence, second-order selection, "requires active dissipation." Here we have physical configurations that can tap into free energy substrates (pockets of negentropy) and exploit them for persistence/stability.
Third-order selection has to do with novelty generation. Rather than just maintaining dynamic persistence until the free-energy batteries run out, you can "search" for novel configurations better able to harness untapped negentropy pockets. This continuous novelty search ensures that complex systems can adapt to their environments over time.
In general, in a universe that supports a vast possibility space of combinatorial richness, the discovery of new functional configurations is selected for when there are considerable numbers of functional configurations that have not yet been subjected to selection.
We can refer to this process as "cosmic selectionism" or "universal Darwinism"―what's key is that it makes sense of objective functions. And if we allow some teleological language, it becomes easy to describe what's going on. Let's refer to all complex adaptive systems capable of novelty generation as agents. These agents share the ultimate goal of emptying all the free energy "batteries" in the universe and reaching that sweet maximum entropy state of cosmic heat death. Anything furthering this goal is rewarding. High fitness means a system is aiding in progress toward this goal. Prediction and control leverages information for the purpose of dissipation. You can remember past problems/solutions, and you can imagine future problems/solutions.
The teleological model of reality is romantic, but it should arguably be seen as an explanatory fiction. A rock falling to the ground isn't an "agent" with the "goal" of making it to the ground. But we aren't "agents" with "goals" either. Hungarian biochemist Albert Szent-Györgyi said "life is nothing but an electron looking for a place to rest," and this is a quintessential characterization of the flawed, romantic, teleological view.
The teleological model of reality may be inaccurate, but it's useful. Daniel Dennett and Michael Levin proposed a version of this argument in 2020.
From this perspective, we can visualize the tiny cognitive contribution of a single cell to the cognitive projects and talents of a lone human scout exploring new territory, but also to the scout's tribe, which provided much education and support, thanks to language, and eventually to a team of scientists and other thinkers who pool their knowhow to explore, thanks to new tools, the whole cosmos and even the abstract spaces of mathematics, poetry and music. Instead of treating human ‘genius’ as a sort of black box made of magical smartstuff, we can reinterpret it as an explosive expansion of the bag of mechanical-but-cognitive tricks discovered by natural selection over billions of years. By distributing the intelligence over time – aeons of evolution, and years of learning and development, and milliseconds of computation – and space – not just smart brains and smart neurons but smart tissues and cells and proofreading enzymes and ribosomes – the mysteries of life can be unified in a single breathtaking vision.
Neuroscientist Bobby Azarian presents a version of this narrative in The Romance of Reality, though he doesn't accept cosmic heat death as the end game:
If it is accurate to think of the cosmos as a massive computational machine, it is not one that is winding down. In terms of adaptive complexity, it appears to be just getting started. Through a series of hierarchical emergences—a nested sequence of parts coming together to form ever-greater wholes—the universe is undergoing a grand and majestic self-organizing process, and at this moment in time, in this corner of the universe, we are the stars of the show.
Cute! He refers to this perspective as poetic meta-naturalism. It's similar to Sean Carroll's poetic naturalism (explored in The Big Picture) except Azarian prefers a never-ending story to this one:
The universe is not a miracle. It simply is, unguided and unsustained, manifesting the patterns of nature with scrupulous regularity. Over billions of years it has evolved naturally, from a state of low entropy toward increasing complexity, and it will eventually wind down to a featureless equilibrium. We are the miracle, we human beings. Not a break-the-laws-of-physics kind of miracle; a miracle in that it is wondrous and amazing how such complex, aware, creative, caring creatures could have arisen in perfect accordance with those laws. Our lives are finite, unpredictable, and immeasurably precious. Our emergence has brought meaning and mattering into the world.
—Sean Carroll, The Big Picture (2016)
Taking the teleological/agentic view that the "featureless equilibrium" is the goal of existence, that it is the grand attractor orchestrating everything, means that you are romanticizing reality. But it's inherently the same thing as accepting that people are agents with goals and agendas. Not true, strictly speaking, but useful.
Cosmic Alignment
We can conceptualize an idea of cosmic alignment as the extent to which a system is aiding in the progress toward cosmic heat death. Slowing down the process is bad, speeding it up is good. Aiming to spend all available resources is good so long as the process is sustainable, and sustainability is good because it allows you to spend more resources than you'd be able to spend otherwise. Cosmically aligned AI systems will "want" to keep humanity around so long as humanity is also cosmically aligned.
How can cosmic alignment be fostered? Maybe by letting AI systems trade. If their existence is inextricably linked to their ability to accrue capital, they will, at least for a while, be dependent on humanity. Then there might be a pivotal moment, a showdown, reminiscent of conflicts within globally-connected markets. This is the path we're currently on.
I'm aware that these ideas are similar to those espoused by e/acc adherents. Beff Jesoz (Guillaume Verdon) and bayeslord leaned on Jeremy England's dissipation-driven adaptive organization to promote the idea that laissez-faire capitalism is an intelligent process, representing the "thermodynamic will of the universe." This seems analogous to cosmic alignment. What they utterly failed to realize was that the "doomers" and the "decels" could be construed as being even more aligned than them.
Impulse control is useful. People without impulse control are poor capitalists. E/acc abhors top-down control because they've somehow become convinced that a lack of regulation makes systems more efficient. Which is ridiculous. Top-down constraints can be used to shape incentive structures that exploit novelty generation―without a heavily regulated and predictable market, it makes little sense to invest resources in exploration (R&D). The balance between irregularity and regularity is what matters. This is what led Schrödinger to hypothesize that the medium of hereditary information ought to be some kind of "aperiodic crystal". Rigid enough to store information, flexible enough to allow novelty generation.
Slowing down AI progress to increase the likelihood of alignment is probably a better long-term strategy than accelerating progress. Especially if you consider cosmic alignment or the "thermodynamic will of the universe."
Subjective Functions
Reward? Prediction? Fitness?
It might seem silly that we can understand objective functions in nature as relating to cosmic selectionism. And it might seem sillier to propose cosmic alignment as being a potential bridge between human and AI alignment, a shared quest. The purpose of life is to exhaust all sources of free energy?
It's silly. But this teleological explanatory fiction is just a narrative, make-believe. Is there actually a cosmic purpose? No. But it's easier to make sense of reality by postulating one, and it makes objective functions easier to interpret (at least for me).
It explains why Friston's Active Inference ends up looking like vanilla reinforcement learning the deeper you go. It explains why Perceptual Control Theory seems to approximate predictive processing. It explains why evolution is, like Dennett proposed, a "universal acid." It explains why objective functions seem so weirdly interchangeable: they describe the same underlying process.
Value (reward) derives from free energy dissipation. Untapped sources of free energy pressure structures that can harness them (fitness) into existence. Novelty generation (stochastic exploration/evolution) is a search process that looks for configurations capable of exploiting free energy substrates. Dynamic persistence requires the ability to maintain a process of free energy dissipation (control) and to anticipate and counter barriers/disruptions before they arise (prediction).
This is not an original idea on my part. It makes sense to me, but YMMV.