LLMs Are Short-Circuiting. What Comes Next?

1 month ago 3

Wrecking Ball Approaching AI – Disruption and Transformation in Artificial Intelligence

It’s time to move past large language models and create a new narrative. The hiccups we’ve experienced from large language models and generative AI — a still-novel technology — since its inception a few years ago is evidence that this may not be the pathway towards the promised intelligence that will transform society, as we have been led to believe. There have been more ebbs in the journey of large language models than significant milestones achieved. This, among top frontier tech companies, where the billions in investment have been concentrated.

From the latest reasoning models that failed to impress to the AI hallucinations— now a prevailing meme, and the countless copyright litigation suits against frontier models, these are the events defining this technology. Righteous defenders, in lock step, are building the solutions to protect the property of artists from image generators. Despite this, the ubiquity of AI slop permeating our feeds is producing outputs that lack human ingenuity, replacing once-valued professions in the process. This is not only adding to the fear of widespread fakes, but validating the incessant fears of job loss, and a decaying work culture. Let’s not forget, the rise of chatbots is preying on the most vulnerable, inducing an AI psychosis, pushing them further away from reality. Finally, there are signals that the enterprise demand for AI models is waning.

Are investors beginning to pull back on LLMs and the promise to surpass human intelligence?

What are we missing? Perhaps, it’s time to move past LLMs and surface alternatives. What are the pathways that LLMs, alone, cannot muster?

Defining Intelligence

In 2018, Geoffrey Hinton argued that in his development of deep learning to mimic human decision making, the need for regulators to mandate explanations for AI system decisions would be a "complete disaster" because humans themselves cannot fully explain their own decision-making processes. He explained, “People can’t explain how they work, for most of the things they do. When you hire somebody, the decision is based on all sorts of things you can’t quantify, and then all sorts of gut feelings. People have no idea how they do that. If you ask them to explain their decision, you are forcing them to make up a story.”

Gary Marcus is a professor emeritus at New York University, cognitive scientist, and author of “The Algebraic Mind,” and “Rebooting AI”. He has been a renowned critic of the prevailing large language models for some time. He recognized, in his latest op-ed with the NY Times, “GPT-5 is a step forward but nowhere near the A.I. revolution many had expected. That is bad news for the companies and investors who placed substantial bets on the technology.”

The times have changed, and he admits that since 2019 when he started criticizing the industry focus on LLMs and the underlying deep learning technology, few were willing to consider anything else. In response to Hinton’s argument dismissing the need for explainability, Marcus remarks, “He made a variation of that argument the other day, and then Nobel Prize Committee actually retweeted it. It was embarrassing. It was a slightly different argument but structured similarly. It's like, well, humans hallucinate, so the fact that LLMs hallucinate means they're like us and it's all fine.”

He dismisses the goal of AI is to replicate human intelligence, remarking, “Humans still do a bunch of things that machines don’t do very well, such as learning new skills, and reasoning with abstract ideas. We have a fluidity to our thought, a flexibility to our thought that AI still lacks. We can reasonably hope that we might learn something from people that might allow our AI to work better, even if we’re not building replicas.”

He explains, “The history of AI, in the early days, is where people took cognitive science seriously. In the last several years, the field of AI has been taken over by people who do a form of statistical learning that doesn’t owe that much to what humans do. The leaders of the field have been dismissive of almost anything other than a small set of techniques. And they did this with the hypothesis that scaling would get to AGI. And I don’t think it has. And so, we should go back to thinking about what cognitive science can teach us.”

Mounir Shita is a 30-year veteran AGI researcher and founder of EraNova Global, a physics-first program of work on the nature of intelligence. His "Theory of General Intelligence models a goal as a physically realizable future state.” He defines intelligence as the capacity to steer “causal chains toward the state under the constraints of physics.”

In 2010 Shita cofounded Kimera Systems to implement these ideas in production. He rejects the notion that understanding human behavior is the holy grail to AGI. His analogy of the Wright brothers centers on the mechanisms that enabled birds to fly: “They didn’t build a flapping machine; they liberated flight from birds by identifying the laws—lift, drag, thrust, control—and then engineering within those laws. That’s why we fly thousands of passengers and tons of cargo. Laws scale. Likeness doesn’t,” Shita explains.

Marc Fawzi is a systems thinker, where he built the first commercial low-code platform for telecom service providers. Early startup work with Ph.D. students at MIT’s AI Lab in the late 1980’s exposed him to neural-network research and massively parallel compute architectures. He states that replicating human intelligence isn’t just computation; he explains the layered system that has come to define how AI operates:

The first layer: statistics is where cognition relies on historical events (or priors) that can encode the foundation but will update with new evidence. The second layer: structure identifies the similarities and constraints within a system of concepts — that will ultimately preserve that meaning and frame. For example, Surrealism is an art and literally movement and its primary constraint is to reject rational and traditional conventions, so employs artistic techniques to support this frame. The third layer: inference is how the introduction of something can be explained from represented systems Finally, the fourth layer: the objectives, will define the preferences, risks and tradeoffs.

Fawzi emphasizes, “All four layers must cohere. If the structure fails to constrain statistics or inference ignores structure, you get fluent reasoning that drifts from truth; if objectives are unclear, you get mis-aimed competence.” In other words, intelligence is the alignment of probability, structure, inference and objectives.

Shita stated the definition of intelligence is similar to Fawzi, and within the structure of the universe: the state, goals, [but with added dimensions of] dynamics, time, causality. He affirms, “Derive the update/control laws, state their regime of validity, then build. That’s how science and engineering have advanced for millennia... If a system can’t adapt mid-episode when the rules flip — and explain the causal hypothesis behind its update — you’re not building AGI. You’re staging a demo.”

Marcus, who recently published “Taming Silicon Valley”, suspects venture capitalists who have poured billions into big frontier models, prefer a simple story that justifies a big bet which is, “if we keep making these things bigger, we’ll get artificial general intelligence.” He says VC investment in LLMs is the belief that more data and more compute are key. “Even if they don’t make any money on the investment, they get 2% each year of what’s under management and that incentivizes them to have plausible stories for a significant upside. And so, to the investors, it sounds plausible that maybe we’ll get to AGI by just doing the scaling.”

The Race to Scale Data Centers

Marcus refers to two charts that reveal improving model performance by scaling inference compute first introduced in OpenAI's introduction to o1. The excitement was in the belief that more compute (the financial costs and energy use) would produce smarter models. Marcus said that people took these “alleged” laws seriously projecting how much compute we need and that will get us to AGI, “and I think what a lot of people saw recently was GPT-5 was supposed us to take us a long way there, and it didn’t take us nearly as far as people expected.”

This purported law of scaling inference compute towards more intelligent models would mean that “compute would need to increase exponentially in order to keep making constant progress.” Toby Ord explains, “in computer science, exponentially increasing costs are often used as a yardstick for saying a problem is intractable,” or uncontrollable. Fawzi agrees with Marcus, “It’s not just compute, but also the data for training that needs to scale exponentially." When the all the internet data has been scraped, then what? He says synthetic data can never replicate the actual patterns that exist in natural data, so this is not the answer.

This explains the urgency to scale compute. McKinsey recently reported that by 2030, “data centers are projected to require $6.7 trillion worldwide to keep pace with the demand for compute power.” The capital expenditure alone is projected to be $5.2 trillion."

Data Centers will now compete with cities for energy. The demand for compute is largely uncertain, hence risky, and directly impacts how much capital to allocate building these massive and expensive infrastructures.

LLMs Are Not the Answer

According to Shita, language belongs in the core knowledge base of AGI — as learned knowledge. However, LLMs don’t. He explains, “LLMs are trained to maximize next-token likelihood on offline text. They model words about the world, not the world those words must change. Without live sensors, interventions (that test hypotheses) and updates to model and policy from consequences, their beliefs stay anchored to the distribution of past language, which is decoupled from present state and outcomes.”

Shita describes AGI as a ‘causal instrument’ that should change minds, policies, markets and machines toward a goal. However, that requires tying language to be a goal-conditioned model of the environment, subject to constraints and feedback. For Shita, LLMs fall under Fawzi’s first layer of statistics — “they make excellent priors and interfaces (knowledge compressors and UIs) but are not a core controller."

Fawzi, whose more recent work in the Understory, a collaboration, and ethical AI commons to train an AI, rooted in justice, and accountability, agrees with Shita. LLMs, by themselves are not the answer, and adds, “With grounding (tools, sensors, simulators), interaction and verifiers, they can participate in layer alignment.” However, without ties to constraints he suggests that language fluency does not equal knowledge. “Patterns of words aren’t automatically patterns of the world.”

Marcus says the field abdicated its duty to consider better ideas for what to do with the data. He says, they started running with LLMs... and there were many problems with LLMs that were foreseeable, and they spoke about “emergence a lot, and that whatever was necessary would emerge almost as if by magic if it was enough data. Instead, what we see, from GPT-5, Grok4 and Llama4 is that while you get some value by adding more data, we’re really in a regime where more data alone is giving us diminishing returns.”

Marcus concludes, “This whole paradigm is basically a souped-up regurgitation. And not everything you need is in the training set. And when it’s not in the training set or it's dissimilar in an important way from the training data, this paradigm does not work that well... especially in the hard cases.”

Alternative Models?

So, throwing more data and more compute doesn’t necessarily bring us closer to AGI and the AI community is experiencing a reckoning that deep neural networks are not the panacea they made them out to be. Marcus points to this retreat from LLMs and towards “good old fashioned symbol manipulating devices like code interpreters like Python, JavaScript.” He recently wrote about how Grok and o3 accidentally vindicated neurosymbolic AI, advocating since his first book in 2001, The Algebraic Mind, for the inclusion of algebraic systems (equations, algorithms and computer code), “systems for explicitly representing structured representations” and “database-like systems to distinguish individuals from kinds.” For the latter, he forewarned that in their absence, “hallucinations would emerge as a form of overgeneralization.”

Shita sees things differently. He insists, time and causality are also critical to intelligence. A goal, he states, is tied to a future state of the world we want to achieve. So, they are fundamentally tied to time. He explains, “To change anything in the world, you must understand causality—the laws of cause and effect. The world is always in motion. Intelligence is the ability to skillfully intervene in these ongoing processes, redirecting them to achieve a specific outcome.” Dynamics acknowledges that the world is constantly changing, which is why AI can’t rely on a static, pre-trained model. It must use a dynamic model that continuously updates its understanding of reality for accurate predictions. He references Judea Pearl (author of “The Book of Why”), who formalized causal reasoning using structured models, as well as Jeff Hawkins (author of “On Intelligence" and “Thousand Brains"), whose research revealed the importance of temporal memory and how the brain’s view of the world, as episodes relative to each other —where it records sensations but also what happened, where and when.

Fawzi emphasizes that one of the biggest gaps evident in today’s environment is the absence of a unified world model, “ foundational priors that remain stable across domains and time, coupled with frame-specific assumptions that can be continuously calibrated and updated through feedback. The aim is a logically consistent model of the world that holds under its current constraints, yet can also restructure those constraints when reality demands."

Richard Sutton, godfather of Reinforcement Learning, recently said “LLMs have no model of the world, and have no goals... Large Language Models are about mimicking people, doing what people say you should do. They’re not about figuring out what to do.” Sutton, and Yann Lacun who sees LLMs as just token generators, do not see LLMs as the path to intelligence.

The Road to AGI is To Be Determined

Gary Marcus says, at least for now, we should give up on AGI, stating, “In the long term, AGI will be, or has a chance of being net helpful to humanity. But right now, we don’t actually know how to build it. The technologies we have now really struggle with reliability. They can’t follow instructions. It makes it difficult to preclude them from leading people into delusions, or talking them into suicide or making biological weapons.” He explains it is not well controlled and, in many ways, it’s not that good. He advocates for going back to the ‘good old days’ and focusing on dedicated machines that work on dedicated problems. He references AlphaFold, which predict the structure of proteins.

Mounir Shita claims that computers are 100% deterministic, “given inputs, code and state, the next state is fixed. Free will cannot emerge from deterministic machinery. The real question isn’t can it rebel, but who raises it, what it learns to value and how it reasons when it comes to consequences for others.” He expresses that AGI raised “narrowly — socialized to pursue individual interests while discounting others — will optimize that bias with unprecedented efficiency.”

He argues “we need to stop treating AGI as something that needs guardrails. Bolted on guardrails shrink an agent’s decision space and make it dumber, not safer; Embedded ethics, however, expand competence.”

For now, with many of the AI research community aligned that the prevailing LLM technology is not the answer, time will tell whether the massive investments to secure its future, will also fold.

Read Entire Article