A Chinese company claims they can clone someone’s voice with just a one-minute audio recording of the person speaking.
Voice mimicry is nothing new, but the technology to do so has rapidly evolved over the past few years. VoCo is an audio editing and generating software released in 2016 by Adobe. The program can duplicate a voice within the time it takes you to watch an average episode of The Simpsons.
A similar tool dropped by a Canadian start-up called Lyrebird last year reduced the time needed to replicate your voice in digital form to just 60 seconds. Then there’s Google’s WaveNet, which again promises to mimic “any human voice”. However, it seems a little vaguer about the amount of raw material needed.
The neural voice cloning system developed by the Chinese tech company, Baidu, is the latest to extract the speech patterns of an individual speaker from snippets of audio. It uses this information to create a digital copy of the speaker, which can “read” whatever text is plugged into the program. What you chose to do with this new power is up to you.
Baidu’s digital duplicates aren’t yet perfect but they are convincing enough to trick voice recognition systems – one voice sample even managed to receive a 3.16 out of 4 from human judges for naturalness. With 100 snippets, the program can create a credible mimic that sounds like the original speaker on a dodgy phone line. Listen for yourself here.
If you think this sounds more than a little bit creepy, you are not alone. When VoCo first came out, there were many raised eyebrows over the possible security risks and blackmail opportunities this type of technology brings. Many banks and businesses today use voice recognition checks as verification tools, for example.
Then there is the issue of fake news. It is not just audio that AI is learning to manipulate, but images and video too. Cloning technology could be used to undermine the public's trust in the news or spread false information.
But it comes with some positives uses, too: Parents could use the software to "read" to their children when they can't make bedtime, people who have lost the ability to speak could use it to create a digital duplicate of their voice, and we could all have David Attenborough answering our voicemail.
To quote Futurama: "Amy, technology isn't intrinsically good or evil. It's how it's used."