The result of one week of deepfake experimentation

From Zero to Deepfake

How easy is it to create a deepfake?

Intro

I am beyond fascinated with deepfakes. From the videos of celebrity mashups like Jim Carrey’s face transferred onto Alison Brie’s body…

Background

I had the recent opportunity to explore deepfake research as a side project and, of course, I jumped on it. I had zero experience in deepfakes but I’ve been following the news about them like everyone else is. I do have machine learning experience but for unrelated applications. None of my experience was applicable to these experiments outside of being able to troubleshoot general software roadblocks. I started from zero.

Getting started: DeepFaceLab vs Faceswap

Two projects dominate open source deepfake generation, DeepFaceLab(DFL) and Faceswap. Faceswap, on the surface, looked like the safer bet. It has multiple contributors, greater popularity, and more documentation but I couldn’t get far with it. I had a lot of difficulty training and, even when getting enough to push forward, the conversion process would never finish. I have since solved both problems — the first by ditching the GUI and using Ubuntu, the second by tweaking video FPS settings — but I had to commit to DFL because it just worked and the clock was ticking.

Using DeepFaceLab

You can download pre-built Windows builds from Google Drive or Mega. These will get you the dependencies and allow you to get started the fastest. The manual comes translated from Russian via Google Translate and, even still, it does the job and gets you through each step with minimal headache.

Step 1: Extraction

The first step in deepfake creation is to extract the frames and then the faces from source and destination videos. You’re looking for several hundred to a couple thousand high-quality images at the end. Each of DFL’s steps comes in the form of an executable batch file that presents options and runs to completion or until you abort it. Advanced usage comes from executing the python directly but you can get far using the bundled executables step by step.

Face extracted by DFL

Step 2: Training

The next step is choosing the model you want to train. DFL comes with six models of differing sophistication and for different purposes. H64 is for less capable graphics cards, Avatar is for manipulating the facial expressions of a source video, and SAE is a combination of other models. The manual and online tutorials recommend SAE but I went with the only model that would run on my initial graphics card, H64. There are a slew of settings that you can tweak here but it’s hard to know their effect until you see them in action. Unfortunately that takes hours or days so you need to be patient and see what you get.

Step 3: Convert

After you have trained the model to a point you are happy, it’s time to convert a destination video and produce your first deepfake. The conversion process takes your model and uses it to generate an image to overlay on the destination video. You can tweak settings that contract or blur the mask, change the coloring of the output, rescale the overlaid image, and more. This is a cheaper process though still took about 30 minutes each run to convert the frames for a 46 second video.

Results

The first experiments with H64 were what I used to determine if DFL had what it took to invest time in.

What I would have done differently

Local vs Cloud

I optimized for a local environment because I thought it would allow me to get started faster. While that probably held true, slight tweaks to the model can take hours or days to amount to anything and I wish I had bitten the bullet and figured out a cloud solution first. This would have allowed me to (presumably) run experiments in parallel and I would have iterated more rapidly. I could be wrong, but I intend to double back and rerun some early experiments on cloud solutions to compare quality and performance.

Source dataset

I started with the impression that tens of thousands of good frames from a handful of source videos were going to be enough. After running more tests it turns out that a couple thousand is all that’s necessary and it is better to have a wider range of images from many different angles and, importantly, lighting conditions.

DFL vs Faceswap

DFL treated me very well and I have no regrets, but I am starting to see where Faceswap is more advanced. DFL does not support multiple GPUs while Faceswap does. Faceswap has also spent more time on the GUI which is going to make this more accessible in the future. I have now started working with Faceswap and it churns along on both GPUs though I am still clumsily working through its documentation.

Conclusion

It was not trivial to get to the point I ended at but the path to getting better is clear: better input data, more manual tweaking, and longer training. There is no doubt that this technology will get better and easier to use and this is only the tip of the iceberg. From a technical expertise perspective I had to apply very little of my ~20 years of experience in software to get started with DFL. The windows build is self-contained and dependency-free. I’m still running into problems with Faceswap but they are all user experience and dependency problems. They are important to address but problems like those are everyday software development chores, not deep machine learning problems solvable only by geniuses and scientists.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jarrod Overson

I write about JavaScript, Reverse Engineering, Security, and Credential Stuffing. Also a speaker, O'Reilly Author, creator of Plato, Director at Shape Security.