For years, scientists have been trying to create a system that can generate synthetic speech from mental activity, and a team of researchers from the University of California, San Francisco have finally cracked it. While the technology still needs some fine-tuning, it could one day be used to artificially restore the voices of people who have lost the ability to speak as a result of brain injuries, strokes, or neurodegenerative conditions like Parkinson’s disease.
At present, the best options available to people with speech disabilities merely allow them to spell out their thoughts letter-by-letter using small muscular movements to control an interface – such as that famously used by Stephen Hawking. However, researchers have been busy developing new devices that can detect the linguistic content of people’s thoughts and read them out loud.
As it turns out, these efforts have all been in vain, as the team behind this incredible breakthrough made the genius decision to abandon this approach and instead focus on decoding the brain activity that coordinates the movement of the mouth and voice box during speech.
This visionary change of tack was inspired by previous research that revealed how the brain’s speech centers don’t directly encode sounds or words, but instead choreograph the vocal apparatus that produce these sounds.
The team ‘borrowed’ five epilepsy patients who had already had electrodes implanted into their brains in order to monitor the neural activity surrounding their seizures, and observed the communication in their speech centers as they read out set phrases.
Describing their work in the journal Nature, the study authors explain how they first decoded the brain activity that manoeuvres the tongue, lips, jaw, and voice box during speech. By correlating these movements with the actual sounds produced during speech, the researchers were able to create a computer simulation of each person’s vocal tract.
When speech-related brain activity patterns are fed into the simulator, it synthesizes the same sounds that would be produced by that person’s actual vocal anatomy.
As the video reveals, the system is capable of generating fluid speech, although certain sounds are not clearly audible. Study author Josh Chartier said in a statement that “We still have a ways to go to perfectly mimic spoken language… We're quite good at synthesizing slower speech sounds like 'sh' and 'z' as well as maintaining the rhythms and intonations of speech and the speaker's gender and identity, but some of the more abrupt sounds like 'b's and 'p's get a bit fuzzy.”
“Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what's currently available," he added. Given that current solutions allow a maximum of only around 10 words a minute, any device that allows whole sentences to be synthesized would massively improve the lives of countless people suffering from speech disabilities.