Pattern machines that we don't understand

4 months ago 3

How do experts make decisions? One theory is that they generate a set of options, estimate the cost and benefits of each option, and then choose the optimal one. The psychology researcher Gary Klein developed a very different theory of expert decision-making, based on his studies of expert decision-making in domains such as firefighting, nuclear power plant operations, aviation, anesthesiology, nursing, and the military. Under Klein’s theory of naturalistic decision-making, experts use a pattern-matching approach to make decisions.

Even before Klein’s work, humans are already known to be quite good at pattern recognition. We’re so good at spotting faces that we have a tendency to see things as faces that aren’t actually faces, a phenomenon known as pareidolia.

As far as I’m aware, Klein used the humans-as-black-boxes research approach of observing and talking to the domain experts: while he was metaphorically trying to peer inside their heads, he wasn’t doing any direct measurement or modeling of their brains. But if you are inclined to take a neurophysiological view of human cognition, you can see how the architecture of the brain provides a mechanism for doing pattern recognition. We know that the brain is organized as an enormous network of neurons, which communicate with each other through electrical impulses.

The psychology researcher Frank Rosenblatt is generally credited with being the first researcher to do computer simulations of a model of neural networks, in order to study how the brain works. He called his model a perceptron. In his paper The Perceptron: a probabilistic model for information storage and organization in the brain, he noted pattern recognition as one of the capabilities of the perceptron.

While perceptrons may have started out as a model for psychology research, they became one of a competing set of strategies for building artificial intelligence systems. The perceptron approach to AI was dealt a significant blow by the AI researchers Marvin Minsky and Seymour Papert in 1969 with the publication of their book Perceptrons. Minsky and Papert demonstrated that there were certain cognitive tasks that perceptrons were not capable of performing.

However, Minsky and Papert’s critique applied to only single-layer perceptron networks. It turns out that if you create a network out of multiple layers, and you add non-linear processing elements to the layers, then these limits to the capabilities of a perceptron no longer apply. When I took a graduate-level artificial neural networks course back in the mid 2000s, the networks we worked with had on the order of three layers. Modern LLMs have a lot more layers than that: the deep in deep learning refers to the large number of layers. For example, the largest GPT-3 model (from OpenAI) has 96 layers, the larger DeepSeek-LLM model (from DeepSeek) has 95 layers, and the largest Llama 3.1 model (from Meta) has 126 layers.

Here’s a ridiculously oversimplified conceptual block diagram of a modern LLM.

There’s an initial stage which takes text and turns it into a sequence of vectors. Then, those sequence of vectors get passed through the layers in the middle. Finally, you get your answer out at the end. (Note: I’m deliberately omitting discussion about what actually happens in the stages depicted by the oval and the diamond above, because I want to focus here on the layers in the middle for this post. I’m not going to talk at all about concepts like tokens, embedding, attention blocks, and so on. If you’re interested in these sorts of details, I highly recommend the video But what is a GPT? Visual intro to Transformers by Grant Sanderson).

We can imagine the LLM as a system that recognizes patterns at different levels of abstraction. The first and last layers deal directly with representations of words, so they have to operate at the word level of abstraction, let’s think of that as the lowest layer. As we go deeper into the network initially, we can imagine each layer as dealing with patterns at a higher level of abstraction, we could call them concepts. Since the last layer deals with words again, layers towards the end would be at a lower layer of abstraction.

But, really, this talk of encoding patterns at increasing and decreasing levels of abstraction is all pure speculation on my part, there’s no empirical basis to this. In reality, we have no idea what sorts of patterns are encoded in the middle layers. Do they correspond to what we humans think of as concepts? We simply have no idea how to interpret the meaning of the vectors that are generated by the intermediate layers. Are the middle layers “higher level” than the outer layers in the sense that we understand that term? Who knows? We just know that we get good results.

The things we call models have different kinds of applications. We tend to think first of scientific models, which are models that give scientists insight into how the world works. Scientific models are a type of model, but not the only one. There are also engineering models, whose purpose is to accomplish some sort of task. A good example of an engineering model is a weather prediction model that tells us what the weather will be like this week. Another good example of an engineering model is SPICE, which electrical engineers use to simulate electronic circuits.

Perceptrons started out as a scientific model of the brain, but their real success has been as an engineering model. Modern LLMs contain within them feedforward neural networks, which are the intellectual descendants of Rosenblatt’s perceptrons. Some people even refer to these as multilayer perceptrons. But LLMs are not an engineering model that was designed to achieve a specific task, the way that weather models or circuits models do. Instead, these are models that were designed to predict the next word in a sentence, and it just so happens that if you build and train your model the right way, you can use it to perform cognitive tasks that it was not explicitly designed to do! Or, as Sean Goedecke put it in a recent blog post (emphasis mine)

Transformers work because (as it turns out) the structure of human language contains a functional model of the world. If you train a system to predict the next word in a sentence, you therefore get a system that “understands” how the world works at a surprisingly high level. All kinds of exciting capabilities fall out of that – long-term planning, human-like conversation, tool use, programming, and so on.

This is a deeply weird and surprising outcome about building a text prediction system. We’ve built text prediction systems before. Claude Shannon was writing about probability-based models of natural language back in the 1940s in his famous paper that gave birth to the field of information theory. But it’s not obvious that once these models got big enough, we’d get results like we’re getting today, where you could ask the model questions and get answers. At least, it’s not obvious to me.

In 2020, the linguistics researchers Emily Bender and Alexander Koller published a paper titled Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. This is sometimes known as the octopus paper, because it contains a thought experiment about a hyper-intelligent octopus eavesdropping on a conversation between two English speakers by tapping into an undersea telecommunications cable, and how the octopus could never learn the meaning of English phrases through mere exposure. This seems to contradict Goedecke’s observation. They also note how research has demonstrated that humans are not capable of learning a new language through mere exposure to it (e.g., through TV or radio). But I think the primary thing this illustrates is how fundamentally different LLMs are from human brains, and how little we can learn about LLMs by making comparisons to humans. The architecture of an LLM is radically different from the architecture of a human brain, and the learning processes are also radically different. I don’t think a human could learn the structure of a new language by being exposed to a massive corpus and then trying to predict the next word. Our intuitions, which work well when dealing with humans, simply break down when we try to apply them to LLMs.

The late philosopher of mind Daniel Dennett proposed the concept of the intentional stance, as a perspective we take for predicting the behavior of things that we consider to be rational agents. To illustrate it, let’s contrast it with two other stances he mentions, the physical stance and the design stance. Consider the following three different scenarios, where you’re asked to make a prediction.

Scenario 1: Imagine that a child has rolled a ball up a long ramp which is at a 30 degree incline. I tell you that the ball is currently rolling up the ramp at 10 metres / second and ask you to predict what its speed will be one minute from now.

Scenario 2: Imagine a car is driving up a hill at a 10 degree incline. I tell you that the car is currently moving at a speed of 60 km/h, and that the driver has cruise control enabled, also set at 60 km/h. I ask you to predict the speed of the car one minute from now.

Scenario 3: Imagine another car on a flat road that going at 50 km/h, and is about to enter an intersection, and the traffic light has just turned yellow. Another bit of information I give you: the driver is heading to an important job interview and is running late. Again, I ask you to predict the speed of the car one minute from now.

In the first scenario (ball rolling up a ramp), we can predict the ball’s future speed by treating it as a physics problem. This is what Dennett calls the physical stance.

In the second scenario (car with cruise control enabled), we view the car as an artifact that was designed to maintain its speed when cruise control is enabled. We can easily predict that its future speed will be 60 km/h. This is what Dennett calls the design stance. Here, we are using our knowledge that the car has been designed to behave in certain ways in order to predict how it will behave.

In the third scenario (driver running late who encounters a yellow light), we think about the intentions of the driver: they don’t want to be late for their interview, so we predict that they will accelerate through the intersection. We predict that the driver will accelerate through the intersection, and so we predict their future speed will be somewhere around 60 km/h. This is what Dennett calls the intentional stance. Here, we are using our knowledge of the desires and beliefs of the driver to predict what actions they will take.

Now, because LLMs have been designed to replicate human language, our instinct is to apply to the intentional stance to predict their behavior. It’s a kind of pareidolia, we’re seeing intentionality in a system that mimics human language output. Dennett was horrified by this.

But the design stance doesn’t really help us either, with LLMs. Yes, the design stance enables us to predict that an LLM-based chatbot will generate plausible-sounding answers to our questions, because that is what it was designed to do. But, beyond that, we can’t really reason about its behavior.

Generally, operational surprises are useful in teaching us how our system works by letting us observe circumstances in which it is pushed beyond its limits. For example, we might learn about a hidden limit somewhere in the system that we didn’t know about before. This is one of the advantages of doing incident reviews, and it’s also one of the reasons that psychologists study optical illusions. As Herb Simon put it in The Sciences of the Artificial, Only when [a bridge] has been overloaded do we learn the physical properties of the materials from which it is built.

However, when an LLM fails from our point of view by producing a plausible but incorrect answer to a question, this failure mode doesn’t give us any additional insight into how the LLM actually works. Because, in a real sense, that LLM is still successfully performing the task that it was designed to do: generate plausible-sounding answers. We aren’t capable of designing LLMs that only produce correct answers, we can only do plausible ones. And so we learn nothing about what we consider LLM failures, because the LLMs aren’t actually failing. They are doing exactly what they are designed to do.

Read Entire Article