Video Machine-learning experts have built a neural network that can manipulate facial movements in videos to create fake footage – in which people appear to say something they never actually said.
It could be used to create convincing yet faked announcements and confessions seemingly uttered by the rich and powerful as well as the average and mediocre, producing a new class of fake news and further separating us all from reality... if it works well enough, naturally.
It’s not quite like Deepfakes, which perversely superimposed the faces of famous actresses and models onto the bodies of raunchy X-rated movie stars.
Instead of mapping faces onto different bodies, though, this latest AI technology controls the target's face, and manipulates it into copying the head movements and facial expressions of a source. In one of the examples, Barack Obama acts as the source and Vladimir Putin as the target. So it looks as though a speech given by Obama was instead given by Putin.
Obama's facial expressions are mapped onto Putin's face using this latest AI technique ... Image credit: Hyeongwoo Kim et al
A paper describing the technique, which popped up online at the end of last month, claims to produce realistic results. The method was developed by Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nießner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt.
The Deepfakes Reddit forum, which has since been shut down, was flooded with people posting tragically bad computer-generated videos of celebs' blurry and twitchy faces pasted onto porno babes using machine-learning software, with mismatched eyebrows and skittish movements. You could, after a few seconds, tell they were bogus, basically.
A previous similar project created a video of someone pretending to say something he or she hadn't through lip-synching and an audio clip. Again, the researchers used Barack Obama as an example. But the results weren't completely convincing since the lip movements didn't always align properly.
That’s less of a problem with this new approach, however. It’s, supposedly, the first model that can transfer the full three-dimensional head position, head rotation, face expression, eye gaze and blinking from a source onto a portrait video of a target, according to the paper.
Controlling the target head
It uses a series of landmarks to reconstruct a face so it can track the head and facial movements to capture facial expressions for the input source video and output target video for every frame. A facial representation method computes the parameters of the face for both videos.
Next, these parameters are slightly modified and copied from the source to the target face for a realistic mapping. Synthetic images of the target’s face are rendered using an Nvidia GeForce GTX Titan X GPU.
The rendering part is where the generative adversarial network comes in. The training data comes from the tracked video frames of the target video sequence. The goal is to generate fake images that are as good as enough as the ones in the target video frames to trick a discriminator network.
Only about two thousand frames – which amounts to a minute of footage – is enough to train the network. At the moment, it's only the facial expressions that can be modified realistically. It doesn't copy the upper body, and cannot deal with backgrounds that change too much.
Experts have raised ethical issues surrounding this technology before. The Malicious AI report focused on fake videos to make people believe false information, and could jeopardize political security.
The paper doesn’t address these concerns too much. But it did say that pushing the limits of this technology and democratising “calls for additional care in ensuring verifiable video authenticity, e.g., through invisible watermarking.”
Justus Thies, a coauthor of the paper and a postdoctoral researcher at the Technical University of Munich, in Germany, told The Register that he recognized the potential dangers of using AI to manipulate fake videos.
“I’m aware of the ethical implications of those reenactment projects," he said.
"That is also a reason why we published our results. I think it is important that the people get to know the possibilities of manipulation techniques.
“Facial reenactment has many useful applications. A prominent example is movie dubbing and postproduction in general. The underlying principles of most manipulation methods are rather old. The film industry uses ‘special effects’ for decades, but no one is aware that similar techniques could be used for forgeries.”
It's a problem that the US military research arm, DARPA, is trying to tackle. It is funding a Media Forensics (MediFor) platform that "will automatically detect manipulations, provide detailed information about how these manipulations were performed, and reason about the overall integrity of visual media to facilitate decisions regarding the use of any questionable image or video."
In fact, Theis is involved in FaceForensics, a project that has received funding from DARPA's MediFor efforts.
"The recently presented systems demonstrate the need for sophisticated fraud detection and watermarking algorithms. We believe that the field of digital forensics will receive a lot of attention in the future," Michael Zollhoefer, a coauthor of the paper and a visiting assistant professor at Stanford University, in the US, told El Reg.
"We believe that funding for research projects that aim at forgery detection is a first good step. In my personal opinion, most important is that the general public has to be aware of the capabilities of modern technology for video generation and editing. This will enable them to think more critically about the video content they consume every day, especially if there is no proof of origin."
The research has been submitted to the annual computer graphics conference SIGGRAPH to be held in Vancouver in August. Here's a video with more examples...
With society already wading through an epidemic of fake news, and struggling to separate truth from convincing fiction, don't expect the situation to get any better in the near future. ®