Skip to main content

Ad

technology-iconTechnologytechnology-iconfuture
clock-iconPUBLISHEDFebruary 4, 2026
comments icon5
share11

Say Hello To EMO, A Robot That's Learnt To Mimic Human Lip Movements By Watching YouTube And Gazing In The Mirror

Could this finally bridge the Uncanny Valley?

Tom Hale headshot

Tom Hale

Tom has a Master's degree in Journalism. His editorial work covers anything from archaeology and the environment to technology and culture.

Senior Journalist

Tom has a Master's degree in Journalism. His editorial work covers anything from archaeology and the environment to technology and culture.View full profile

Tom has a Master's degree in Journalism. His editorial work covers anything from archaeology and the environment to technology and culture.

View full profile
EditedbyHolly Large
Holly Large headshot

Holly Large

Copy Editor & Staff Writer

Holly has a degree in Medical Biochemistry from the University of Leicester. Her scientific interests include genomics, personalized medicine, and bioethics.

photograph of the face of EMO, a robot that has learnt to lip sync

The researchers hope EMO's life-like lip movements could help robots interact more effectively with humans.

Image credit: Jane Nisselson/Columbia Engineering


Watching hours of YouTube videos isn’t usually considered the most prestigious form of education, but it might be a surprisingly effective tool for robots learning to communicate like humans.

The rest of this article is behind a paywall. Please sign in or subscribe to access the full content.

Researchers at Columbia Engineering have created EMO, a robot capable of learning to lip sync to speech, mimicking the complex and nuanced movements of the human mouth.

Using artificial intelligence (AI), the system learns to map audio signals directly to precise lip and facial movements, allowing it to mimic speech patterns and singing without any pre-programmed rules. Rather than a mechanical mouth that simply opens and shuts, the robot features soft silicone lips that move with 10 degrees of freedom, powered by 26 motors.

The learning process starts with the robot gazing at itself in a mirror and experimenting with its own facial expressions, like a kid pulling faces at its own reflection. To gather more information on how to interact convincingly with humans, the robot then watches hours of people speaking and singing in YouTube videos, carefully analyzing how their lips and faces move as they utter different words and sounds.

Put together, EMO can translate audio in multiple languages directly into coordinated lip and facial movements, effectively teaching itself how to speak and sing.

“Something magical happens when a robot learns to smile or speak just by watching and listening to humans. I’m a jaded roboticist, but I can’t help but smile back at a robot that spontaneously smiles at me,” Hod Lipson, study author and director of Columbia’s Creative Machines Lab, said in a statement.

All of this represents a meaningful leap forward in humanoid robotics. Traditionally, robot mouths operate like “puppets,” with stiff, clunky movements following pre-set scripts, which often feel unnatural or lifeless. The new system is dynamic, allowing the robot to adapt its facial expressions in real-time, making its speech and even its singing feel far more natural and humanlike.

At least, that's the idea. One major hurdle with robots is the Uncanny Valley, an effect where human-like entities can make people feel creeped out and uneasy. Life-like robots can often evoke this response because they superficially appear human, but something very subtle is “off” it might be an odd lip movement, a strange look behind their eyes, or a tiny detail that we barely consciously register. If humankind and robotkind are going to have a fruitful relationship in the future, it's a problem that researchers need to iron out. 

Watching EMO speak and sing can still feel a little unnerving, if we’re being totally honest, but the team is confident the chatty robot will improve with more practice and data.

“The more it interacts with humans, the better it will get,” said Lipson.

“When the lip sync ability is combined with conversational AI such as ChatGPT or Gemini, the effect adds a whole new depth to the connection the robot forms with the human,” explained Yuhang Hu, who led the study for his PhD. “The more the robot watches humans conversing, the better it will get at imitating the nuanced facial gestures we can emotionally connect with.”

To further flex the skills of EMO, the researchers have also published an AI-generated debut album created by the system called “hello world_”, featuring a host of bangers like “Don’t Call Me Clanker” and “Why Are You (Humans) So Complicated?”

The study is published in the journal Science Robotics.


Written by 

Add us as a Google preferred source to see more of our
trusted coverage in Search