Researchers at Carnegie Mellon University have devised a way to transform the "style" of one video to another.
This technique lets you transfer the speech and facial expressions of comedian John Oliver, for example, onto another bespectacled suited late-night television host (in this case, Stephen Colbert) – or onto a cartoon frog. Alternatively, you can change the bloom of a daffodil to the bloom of a hibiscus or slow the passing of clouds so that the weather appears much calmer than it was in reality.
The method was presented yesterday at the European Conference on Computer Vision (ECCV 2018) in Munich and is available to view on arXiv.
How does it work? Like most Deepfakes, it uses an AI technology called generative adversarial networks (GANS). These are, essentially, a pair of algorithms that work in opposition to create mind-bendingly convincing fake videos. One algorithm (the discriminator) learns how to spot inconsistencies in video style, while the second (the generator) learns how to make up videos that fit a certain style. This means the generator is constantly trying to trick the discriminator. The result: over time and through practice, the videos become more and more lifelike.
This in itself is not particularly new. It has been used to transpose Nicholas Cage's face onto other actors and (more sinisterly) celebrity faces into adult films.
What is new is a technique called Recycle-GAN. This is one better than cycle-GAN, which checks the quality of the fake images by converting it back into the style of the original image – a process the researchers describe as translating English to Spanish and then back to English again. So, for example, footage of Colbert may be "translated" to match the style of Oliver. This fake footage would then be "translated" back to Oliver so that you can see how good the "translation" was in the first place.
Carnegie Mellon University/YouTube
Even with this "check" there are inconsistencies that regularly crop up – and this is where Recycle-GAN comes in. Like cycle-GAN it analyzes the spatial characteristics of the images but unlike cycle-GAN, it incorporates a temporal element, which allows it to scrutinize changes over time.
As well as facial expressions, it can transfer the movement and cadence of the footage. This means it can change the style and movements of one person to fit another but it also means it can manipulate the movement of objects or weather patterns. (See the daffodil and clouds in the video below as an example).
As the video above explains, "This method could help filmmakers work quicker and cheaper or help autonomous cars learn how to drive at night." It could also be used to color black and white movies and even help teach self-driving cars how to manoeuver in the dark.
Of course, it could very well be used for more nefarious purposes.
As the researchers themselves point out, the technique could be used to make Deepfakes, ie any footage that makes it appear someone has said or done something they have not. While there are certain giveaways that can help people in the know catch a Deepfake, they are becoming more and more convincing, and experts have warned they may be used connivingly to influence future global politics. For example, in the video, it compares facial expressions of Obama and Martin Luther King (who share similar ideologies) to Obama and Trump (who do not).
Now, to lighten the mood, here is a compilation of Nick Cage Deepfakes.