Focusing on sound production instead of word choice makes for a flexible system.
The participant's implant gets hooked up for testing. Credit: UC Regents
Stephen Hawking, a British physicist and arguably the most famous man suffering from amyotrophic lateral sclerosis (ALS), communicated with the world using a sensor installed in his glasses. That sensor used tiny movements of a single muscle in his cheek to select characters on a screen. Once he typed a full sentence at a rate of roughly one word per minute, the text was synthesized into speech by a DECtalk TC01 synthesizer, which gave him his iconic, robotic voice.
But a lot has changed since Hawking died in 2018. Recent brain-computer-interface (BCI) devices have made it possible to translate neural activity directly into text and even speech. Unfortunately, these systems had significant latency, often limiting the user to a predefined vocabulary, and they did not handle nuances of spoken language like pitch or prosody. Now, a team of scientists at the University of California, Davis has built a neural prosthesis that can instantly translate brain signals into sounds—phonemes and words. It may be the first real step we have taken toward a fully digital vocal tract.
Text messaging
“Our main goal is creating a flexible speech neuroprosthesis that enables a patient with paralysis to speak as fluently as possible, managing their own cadence, and be more expressive by letting them modulate their intonation,” says Maitreyee Wairagkar, a neuroprosthetics researcher at UC Davis who led the study. Developing a prosthesis ticking all these boxes was an enormous challenge because it meant Wairagkar’s team had to solve nearly all the problems BCI-based communication solutions have faced in the past. And they had quite a lot of problems.
The first issue was moving beyond text—most successful neural prostheses developed so far have translated brain signals into text—the words a patient with an implanted prosthesis wanted to say simply appeared on a screen. Francis R. Willett led a team at Stanford University that achieved brain-to-text translation with around a 25 percent error rate. “When a woman with ALS was trying to speak, they could decode the words. Three out of four words were correct. That was super exciting but not enough for daily communication,” says Sergey Stavisky, a neuroscientist at UC Davis and a senior author of the study.
Delays and dictionaries
One year after the Stanford work, in 2024, Stavisky’s team published its own research on a brain-to-text system that bumped the accuracy to 97.5 percent. “Almost every word was correct, but communicating over text can be limiting, right?” Stavisky said. “Sometimes you want to use your voice. It allows you to make interjections, it makes it less likely other people interrupt you—you can sing, you can use words that aren’t in the dictionary.” But the most common approach to generating speech relied on synthesizing it from text, which led straight into another problem with BCI systems: very high latency.
In nearly all BCI speech aids, sentences appeared on a screen after a significant delay, long after the patient finished stringing the words together in their mind. The speech synthesis part usually happened after the text was ready, which caused even more delay. Brain-to-text solutions also suffered from a limited vocabulary. The latest system of this kind supported a dictionary of roughly 1,300 words. When you tried to speak a different language, use more elaborate vocabulary, or even say the unusual name of a café just around the corner, the systems failed.
So, Wairagkar designed her prosthesis to translate brain signals into sounds, not words—and do it in real time.
Extracting sound
The patient who agreed to participate in Wairagkar’s study was codenamed T15 and was a 46-year-old man suffering from ALS. “He is severely paralyzed and when he tries to speak, he is very difficult to understand. I’ve known him for several years, and when he speaks, I understand maybe 5 percent of what he’s saying,” says David M. Brandman, a neurosurgeon and co-author of the study. Before working with the UC Davis team, T15 communicated using a gyroscopic head mouse to control a cursor on a computer screen.
To use an early version of Stavisky’s brain-to-text system, the patient had 256 microelectrodes implanted into his ventral precentral gyrus, an area of the brain responsible for controlling vocal tract muscles.
For the new brain-to-speech system, Wairagkar and her colleagues relied on the same 256 electrodes. “We recorded neural activities from single neurons, which is the highest resolution of information we can get from our brain,” Wairagkar says. The signal registered by the electrodes was then sent to an AI algorithm called a neural decoder that deciphered those signals and extracted speech features such as pitch or voicing. In the next step, these features were fed into a vocoder, a speech synthesizing algorithm designed to sound like the voice that T15 had when he was still able to speak normally. The entire system worked with latency down to around 10 milliseconds—the conversion of brain signals into sounds was effectively instantaneous.
Because Wairagkar’s neural prosthesis converted brain signals into sounds, it didn’t come with a limited selection of supported words. The patient could say anything he wanted, including pseudo-words that weren’t in a dictionary and interjections like “um,” “hmm,” or “uh.” Because the system was sensitive to features like pitch or prosody, he could also vocalize questions saying the last word in a sentence with a slightly higher pitch and even sing a short melody.
But Wairagkar’s prosthesis had its limits.
Intelligibility improvements
To test the prosthesis’s performance, Wairagkar’s team first asked human listeners to match a recording of some synthesized speech by the T15 patient with one transcript from a set of six candidate sentences of similar length. Here, the results were completely perfect, with the system achieving 100 percent intelligibility.
The issues began when the team tried something a bit harder: an open transcription test where listeners had to work without any candidate transcripts. In this second test, the word error rate was 43.75 percent, meaning participants identified a bit more than half of the recorded words correctly. This was certainly an improvement compared to the intelligibility of the T15’s unaided speech where the word error in the same test with the same group of listeners was 96.43 percent. But the prosthesis, while promising, was not yet reliable enough to use it for day-to-day communication.
“We’re not at the point where it could be used in open-ended conversations. I think of this as a proof of concept,” Stavisky says. He suggested that one way to improve future designs would be to use more electrodes. “There are a lot of startups right now building BCIs that are going to have over a thousand electrodes. If you think about what we’ve achieved with just 250 electrodes versus what could be done with a thousand or two thousand—I think it would just work,” he argued. And the work to make that happen is already underway.
Paradromics, a BCI-focused startup based in Austin, Texas, wants to go ahead with clinical trials of a speech neural prosthesis and is already seeking FDA approval. “They have a 1,600 electrode system, and they publicly stated they are going to do speech,” Stavisky says. “David Brandman, our co-author, is going to be the lead principal investigator for these trials, and we’re going to do it here at UC Davis.”
Nature, 2025. DOI: 10.1038/s41586-025-09127-3