A team of researchers has used artificial intelligence (AI) to reconstruct videos by using continuous functional Magnetic Resonance Imaging (fMRI) data of participants' brains.
Publishing their findings, which are yet to be peer-reviewed, on pre-print server arXiv, the researchers used data taken from volunteers who had watched videos of varied inputs – including animals, humans, and natural scenery – while undergoing brain scans.
"The task of recreating human vision from brain recordings, especially using non-invasive tools like functional Magnetic Resonance Imaging (fMRI), is an exciting but difficult task," the team, from the National University of Singapore and The Chinese University of Hong Kong, wrote in their study. "Non-invasive methods, while less intrusive, capture limited information, susceptible to various interferences like noise."
One challenge for recreating video (or moving) input (i.e. what someone watched while having their brain scanned) is that fMRI machines capture snapshots of brain activity every few seconds. Worse:
"Each fMRI scan essentially represents an 'average' of brain activity during the snapshot. In contrast, a typical video has about 30 frames per second (FPS). If an fMRI frame takes 2 seconds, during that time, 60 video frames - potentially containing various objects, motions, and scene changes - are presented as visual stimuli. Thus, decoding fMRI and recovering videos at an FPS much higher than the fMRI’s temporal resolution is a complex task."
They trained the AI – which they call MinD-Video – to decode the fMRI data and tweaked the image-generating AI model Stable Diffusion to recreate the input as video. The videos were then assessed in terms of semantics (whether the AI understood the input was a cat, or a running human etc) and scene dynamics, or how close the visual reconstruction looked at the pixel-level.
The team report that their system was 85 percent accurate in terms of semantics, outperforming the previous best-performing AI model by 45 percent.
"Basic objects, animals, persons, and scene types can be well recovered [from brain scan data]," the team added. "More importantly, the motions, such as running, dancing, and singing, and the scene dynamics, such as the close-up of a person, the fast-motion scenes, and the long-shot scene of a city view, can also be reconstructed correctly".
The researchers, who published more examples on their website Mind-Video, hope that the work has promise in developing brain-computer interfaces, though they stress regulation is necessary to protect people's biological data "and avoid any malicious usage of this technology".
The study is published on pre-print server arXiv.