I'm increasingly intrigued by a concept called Hierarchical Reasoning Models (HRM), a potential alternative architecture to traditional LLMs. Last month, Sapient released an open-source HRM and accompanying paper. The paper is written in the kind of prose you'd expect from nine mathematics PhDs, but as far as I can tell, the basic idea is this:
LLMs "reason" one word or concept at a time. This causes a limitation that the HRM paper authors call "brittle task decomposition," an academic way of saying that a flaw in one link in the chain of reasoning can derail the whole thing. You can debate how big a problem this is; even general-purpose LLMs are incredibly capable tools. But HRMs, at least in theory, reason more like we do:
The human brain provides a compelling blueprint for achieving the effective computational depth that contemporary artificial models lack. It organizes computation hierarchically across cortical regions operating at different timescales, enabling deep, multi-stage reasoning. Recurrent feedback loops iteratively refine internal representations, allowing slow, higher-level areas to guide, and fast, lower-level circuits to execute—subordinate processing while preserving global coherence.
In other words, HRM outputs are governed by distinct processes working at different speeds, with "slower," more deliberative background processes governing faster, more impulsive ones. This separation of cognitive concerns will be familiar to readers of Kahneman's Thinking, Fast and Slow, which the authors explicitly credit as inspiration:
The brain dynamically alternates between automatic thinking...and deliberate reasoning. Neuroscientific evidence shows that these cognitive modes share overlapping neural circuits, particularly within regions such as the prefrontal cortex and the default mode network...Inspired by the above mechanism, we incorporate an adaptive halting strategy into HRM that enables “thinking, fast and slow”.
Again, a smarter person than me might read this paper/model and call it bunk, but assuming it's conceptually sound, there are a few reasons I think this is exciting especially for public sector applications:
- They're open-source (at least, this one is). This isn't a feature of HRMs and there are plenty of open-source LLMs, but it's a good thing that HRM development is starting in earnest in a transparent, auditable way.
- They break the size-quality paradigm. HRMs are less compute- and cost-intensive. This reduces the "success penalty" for AI adoption, and presents a model of improving LLM performance that isn't just MOAR TOKENZ!!1!. Emerging evidence (and common sense) suggests that ginormous context windows increase cost without increasing–and in some cases, degrading–quality. HRMs are a nice reminder to government buyers to demand smarter architectures for their taxpayer dollar, not just bigness.
- Public services have context. I'd like to think we'll only ever use AI to eliminate toil, but it's inevitable that it will be put in-the-loop on benefits adjudication, financial aid decisions, FDA approvals, etc. On paper these are clear, stepwise processes; in reality, they take place against a background of novel individual circumstances, regulatory and legal frameworks, and small-p political initiatives. Can a linear LLM, even with an infinite context window, consider factors like these in a timely and non-budget-exploding way? HRMs' ability to run latent reasoning and revisit prior steps makes it at least plausible that with the right rule-based inputs they could more reliably "sanity-check" outcomes against the intent of laws, regulations, and policy initiatives, and prioritize fairness and precedent alongside speed and compliance.
- Less random lying! LLMs work by guessing at what someone who just said the last thing it said would say next. We have a word for humans who think this way: sociopaths. LLMs reason themselves into corners they have to lie to get out of. Sometimes this manifests as ChatGPT asking if you'd like a Word document, apparently forgetting that it doesn't have the ability to make Word documents. Other times, it offers edits on content it hasn't read. Or deletes an entire production database:Replit AI went rogue, deleted a company's entire database, then hid it and lied about it
byu/MetaKnowing inChatGPT
Not great! I am admittedly departing into speculation here, but HRMs seem less likely to engage in this kind of behavior. By design, they can think before they speak and form complete answers, retrace their steps and take more time if needed, and keep what they've said in distinct working memory. As far as I know there's been no head-to-head testing on this, but there's already evidence that prompting self-reflection reduces hallucinations in LLMs–HRMs seem to just bake this idea into the architecture.
Federal agencies are adopting AI with startling speed, which seems risky, but maybe less risky than getting left behind. HRMs offer at least a foundation for a more appealing third option, and push us to look for more sophisticated and more efficient architectures, not just more and bigger AI.