Scientists at Columbia University have devised a clever way of converting thoughts into speech using a nifty combination of speech synthesizers and artificial intelligence (AI).
The technology effectively connects to and "listens" to the brain, detecting patterns of activity it can then "translate" into words. As of right now, its abilities are relatively basic but, as the researchers note in Scientific Reports, the possibilities are huge. Not only could it offer us a way to communicate with computers, it may one day offer potentially life-changing solutions to people with speech-limiting conditions – for example, those who have had a stroke or are living with amyotrophic lateral sclerosis (ALS), like the late great Stephen Hawking.
The process hinges on the tell-tale patterns of activity that light up our brains when we speak or even just think about speaking. Similarly, when we listen to someone else speak (or imagine doing so), there are various other patterns that present in the brain.
But while previous attempts to "read" brain activity have relied on spectrogram-analyzing computer models and have been unsuccessful, this new technique uses the technology adopted by Apple for Siri and Amazon for Alexa – an AI-enabled vocoder.
Vocoders are a type of computer algorithm that is able to synthesize speech, but first it has to be trained on recordings of people talking. For this particular study, led by Nima Mesgarani, a principal investigator at Columbia University's Mortimer B. Zuckerman Mind Brain Behavior Institute, the vocoder was trained with the help of five epilepsy patients, chosen because they were already undergoing brain surgery. While the epilepsy patients were asked to listen to the speech of various different people, the researchers monitored their brain activity.
Then, the experiment really began. To test whether or not the algorithm was now able to "read" the participants' brainwaves, the researchers played recordings of the same speakers reeling off sequences of digits between 0 and 9. The brain signals of the epilepsy patients were recorded and run through the vocoder. The results of the vocoder were then checked and "cleaned up" with AI (neural networks). Finally, a robotic-sounding voice repeated the sequence numbers.
To determine how accurate (or not) the AI-enabled vocoder had been, the researchers asked participants to listen to the recording and say what they heard.
"We found that people could understand and repeat the sounds about 75 percent of the time, which is well above and beyond any previous attempts," Mesgarani said of the result in a statement.
"The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy."
The next steps will be to try to work on more complicated sequences – for example, actual sentences such as "I need a glass of water". But while there is clearly some way to go and there are limitations to the technology as it stands at the moment, the implications could be ground-breaking.
"Our voices help connect us to our friends, family and the world around us, which is why losing the power of one's voice due to injury or disease is so devastating," Mesgarani added.
"With today's study, we have a potential way to restore that power. We've shown that, with the right technology, these people's thoughts could be decoded and understood by any listener."