Topographic Neural Networks Help AI See Like a Human

9 hours ago 2

AI-vision models have improved dramatically over the past decade. Yet these gains have led to neural networks that, though effective, don’t share many characteristics with human vision. For example, convolutional neural networks (CNN) are often better at noticing texture, while humans respond more strongly to shapes.

A paper recently published in Nature Human Behaviour has partially addressed that gap. It describes a novel All-Topographic Neural Network (All-TNN) that, when trained on natural images, developed an organized, specialized structure more like human vision. The All-TNN better mimicked human spatial biases, like expecting to see an airplane closer to the top of an image than the bottom, and operated on a significantly lower energy budget than other neural networks used for machine vision.

“One of the things you notice when you look at the way knowledge is ordered in the brain, is that it’s fundamentally different to how it is ordered in deep neural networks, such as convolutional neural nets,” said Tim C. Kietzmann, professor at the Institute of Cognitive Science in Osnabrück, Germany and cosupervisor of the paper.

All-TNN networks learn humanlike spatial biases

Most machine vision systems used today, including those found in apps like Google Photos and Snapchat, use some form of convolutional neural network. CNNs replicate identical feature detectors across many spatial locations (which is known as “weight sharing”). The result is a network that, when mapped, looks like a tightly repeatingfractal pattern.

The structure of an All-TNN network is much different. It instead appears smooth, with related neurons organized into clusters but never replicated. Images that map the spatial relationships in the All-TNN network look like the topographic view of a hilly region, or a group of microorganisms viewed under a microscope.

This visual difference is more than just a comparison of pretty pictures. Kietzmann said the weight sharing used by CNNs is a fundamental deviation from biological brains. “The brain can’t, when it learns something in one location, copy that knowledge over to other locations,” he said. “While in a CNN, you can. It’s an engineering hack to be a bit more efficient at learning.”

The All-TNN network avoided that characteristic through a fundamentally different architecture and training approach.

Instead of weight sharing, the researchers gave each spatial location in the network its own set of learnable parameters. Then, to prevent this from creating chaotic, unorganized features, they added a “smoothness constraint” in training that encouraged neighboring neurons to learn similar (yet never identical) features.

To test whether that translated to machine vision with more humanlike behavior, the researchers asked 30 human participants to identify objects flashed briefly in different screen locations. While the ALL-TNN still wasn’t a perfect analog for human vision, it proved three times as strongly correlated with human vision as the CNN network.

Zejin Lu, a coauthor on the paper, said the All-TNN’s improved correlation with human vision was driven by how the network learned spatial relationships. “For humans, when you detect certain objects, they have a typical position. You already know the shoes are usually at the bottom, on the ground. The airplane, it’s at the top,” he explained.

Overhead view of three people writing on a small table with dry erase markers. Members of the team working on all-topographic neural networks theorized that their approach would result in more humanlike machine vision. Simone Reukauf

Humanlike behavior doesn’t mean better performance, but it does reduce energy

The stronger correlation between the All-TNN network and human vision shows how machines can be taught to see the world in a manner more similar to humans, but it doesn’t necessarily result in a network that’s better at image classification.

CNNs remained the king of image classification with an accuracy at 43.2 percent. The All-TNN achieved classification accuracies ranging from 34.5 to 36 percent, depending on the network’s configuration.

What it lacked in accuracy, though, it gained in efficiency. The All-TNN consumed significantly less energy than tested CNN, with the CNN using over ten times as much energy in operation. Remarkably, this was achieved despite the fact the All-TNN was roughly 13 times the size of the CNN tested (at roughly 107 million parameters for the All-TNN versus roughly 8 million for the CNN).

The All-TNN’s efficiency is thanks to the network’s novel structure. Though larger overall, the network can focus on the most important parts of images rather than processing everything uniformly. “You’ve got a whole lot of different neurons that could respond, but only a fraction is responding,” said Kietzmann.

The All-TNN’s efficiency could have implications for machine vision on low-power devices. However, Kietzmann and Lu emphasized that energy efficiency wasn’t their primary goal or what they found more interesting in the paper’s results. Instead, they hope that new network architectures, like the All-TNN, will provide a more complete framework for understanding intelligence—both artificial and human.

Kietzmann remarked that the pursuit of scale seems inconsistent with what’s known about how real brains (which have access to far less data and use far less energy) develop. Networks that try to mimic humanlike behaviors could provide an alternative to the pursuit of scale (through the use of more training data and training larger models with more parameters) at all costs.

“There’s this trend, a feeling that scale is just too boring of an answer to the fundamental question of how cognition comes about,” said Kietzmann.

Read Entire Article