If robots are ever to interact socially with humans, they will first need to develop the capacity for Theory of Mind (ToM), which entails the ability to empathize with others. While the development of artificial intelligence (AI) systems with such advanced cognition remain some way off, researchers from Columbia University have succeeded in creating a robot with what they call “visual theory of behavior”. Describing their work in the journal Scientific Reports, the study authors explain that this trait may well have arisen in animals as an evolutionary precursor to ToM, and could represent a major step towards the creation of AI with complex social capabilities.
Theory of Mind is a major hallmark of human cognition and is thought to arise in most children around the age of three. It allows us to comprehend the needs and intentions of those around us, and therefore facilitates complex social activities such as playing games that have fixed rules, competing in business, and even lying to one another.
Typically, ToM relies on symbolic reasoning, whereby the brain explicitly analyzes inputs in order to predict the future actions of another person, generally using language. This can only be achieved using impressive neural equipment like a prefrontal cortex – something all humans possess but which is way too advanced for robots.
However, the study authors hypothesize that some of our evolutionary ancestors may have developed an ability to implicitly predict the actions of others by simply visualizing them in their mind’s eye, long before the capacity for explicit symbolic reasoning ever emerged. They label this faculty visual theory of behavior, and set about recreating it in an AI system
To do so, they programmed a robot to continually move towards one of two green spots in its visual field, always opting for whichever it deemed to be the closer of the two. At times, the researchers prevented the robot from being able to see the closest green spot by obscuring it with a red block, causing the gadget to move towards the spot that was furthest away.
A second AI spent two hours observing this first robot as it continually completed the task. Crucially, this observer robot had a bird’s-eye-view of the scene and could therefore always see both of the green spots. Eventually, this AI leant exactly what was going on and developed the ability to predict what the first robot would do, just by looking at the arrangement of the green dots and the red block.
The observer AI was able to forecast the goal and actions of the first robot with 98.45 percent accuracy, despite lacking the ability for symbolic reasoning.
"Our findings begin to demonstrate how robots can see the world from another robot's perspective. The ability of the observer to put itself in its partner's shoes, so to speak, and understand, without being guided, whether its partner could or could not see the green circle from its vantage point, is perhaps a primitive form of empathy," explained study author Boyuan Chen in a statement.
This image-based processing ability is obviously more primitive than language-based processing or other forms of symbolic reasoning, but the study authors speculate that it may have acted as an evolutionary stepping stone towards ToM in humans and other primates.
“We conjecture that perhaps, our ancestor primates also learned to process a form of behavior prediction in a purely visual form, long before they learned to articulate internal mental visions into language,” they explain in their write-up.