When you strike up a conversation with a stranger in a crowded room, you’re able to focus on what they’re saying, filtering out the chatter coming from dozens of other people talking all at once. Scientists who study hearing call this the "cocktail party problem." It's a problem because they’re still trying to figure out exactly how our brains and ears perform this trick – and how to build sensors and machines that can do the same. But now, researchers say they’ve created a listening device that surpasses those that have come before by using only a single sensor and futuristic acoustic metamaterials to isolate where a sound is coming from in a room with multiple audio sources. Their work is published in Proceedings of the National Academy of Sciences this week.
Researchers trying to create devices that mimic our extraordinary listening abilities have done so in two primary ways. The first focuses on the words themselves: analyzing the content and patterns in language to separate conversations coming from different sources. The second uses spatial detection – that is, figuring out which direction each sound is coming from. That works, but it requires multiple sensors and makes the setup more complex.
To get down to a single-sensor setup, a team led by Duke’s Steven Cummer used acoustic metamaterials that can modulate the frequency of sound waves. In practice, their listener is donut-shaped with a single sensor in the middle surrounded by 36 "waveguides" that analyze and encode the incoming sound. The result is a mixed signal made up of audio from all three sources, which the team then runs through an inversion algorithm to determine both its nature and where it's coming from. This is how the listening apparatus separates the sounds back out into all three sources and reconstructs the audio.
In their experiments, the researchers surrounded the listener with three speakers that formed a triangle around it. They played overlapping sounds from all three, with each "conversation" made up of a selection of 40 synthesized pulses meant to represent different words. Nearly 97% of the time, the setup could correctly determine the source of the audio and reconstruct the content.
The key to this design is its simplicity: The listener uses no electronic components other than the microphone and doesn't rely on linguistic models that use a lot of computational power to determine what's being said in every conversation. The researchers imagine a single-sensor device like this one being used for speech recognition in electronic devices or even in hearing aids and ultrasound machines – devices that require precise analysis of incoming sound.