New AI Can Figure Out What You Look Like Just From The Sound Of Your Voice


Ben Taub

Freelance Writer

clockJun 12 2019, 15:00 UTC

Artificial intelligence is helping to put faces to voices.

From the guy who does the voice-over for movie trailers to the announcers on the subway, our lives are full of faceless voices. And while most of us are content to build a mental image of these disembodied orators, a group of researchers from MIT has gone a step further by creating an artificial intelligence system that can reconstruct people’s faces just by listening to their voice.

The application, called Speech2Face, is a deep neural network that was trained to recognize the correlation between voices and facial features by observing millions of YouTube videos of people talking. In doing so, it learned to associate different aspects of the audio waveform with a speaker’s age, gender, and ethnicity, as well as certain cranial features such as the shape of the head and the width of the nose.


When the researchers then fed the system audio recordings of people’s voices, it was able to generate an image of each speaker’s face with reasonable accuracy.

Speech2Face is able to ascertain characteristics such as age, gender, ethnicity, and head shape just from the sound of a person's voice. MIT CSAIL/IEEE Xplore

Obviously, characteristics like hairstyle, facial hair, and certain other elements of physical appearance are impossible to predict from a person’s voice, so the developers insist that their goal was “not to predict a recognizable image of the exact face, but rather to capture dominant facial traits of the person that are correlated with the input speech.”

In a paper published on IEEE Xplore, the researchers say this technology could one day find a range of useful applications, such as generating faces for video calls without the need for cameras.


However, some improvements are clearly still needed, as while the images created by Speech2Face are generally a good match for face type, they often only bear a general resemblance to the speaker. The system is also prone to the occasional error, with roughly 6 percent of the faces it created being of the wrong gender, and some of the wrong ethnicity.

Nevertheless, faceless voices are one step close to becoming a thing of the past, which should have major implications for prank callers at least.

  • tag
  • artificial intelligence,

  • face,

  • AI,

  • voice recognition