[font=Tahoma, sans-serif]FaceSwap Tutorial: Extraction and Training[/font]
[font=Tahoma, sans-serif]If you have not read Part One of this guide, please go back and do so now.[/font]
[font=Tahoma, sans-serif]This guide is based upon my personal understanding and workflow. There may be better or more efficient ways to accomplish the same tasks. This guide will cover only the essential basics of Faceswap usage. After you have used Faceswap a few times, you can experiment with more advanced settings and usage on your own. The purpose of this guide is simply to get you started.[/font]
[font=Tahoma, sans-serif]Step 1: EXTRACTION[/font]
[font=Tahoma, sans-serif]In order to create a DeepFake, we need two sets of data. Our data will be in the form of face images. We will have the face we want to replace. We will call this Data A. We will also have the face we want to put into our video. This is the face that is doing the replacing. We will call this Data B.[/font]
[font=Tahoma, sans-serif]Before we begin the extraction process, we need to create a basic folder structure. This will assist in keeping our project organized, and we will be less likely to accidentally delete data, or make mistakes. We do this by opening This PC (or My Computer if you are using an older version of Windows) and selecting the drive that we want to store our Faceswap data and images on. Let's assume we have 2 drives, the C drive (the primary drive that includes our OS), and an F drive (a secondary drive for additional storage). Note: you can do this on a single hard drive, if you don't have mutiple drives. The basic process is the same. The process is as follows:[/font]
- [font=Tahoma, sans-serif]Open the drive where you want to store your Faceswap data and images. Your drive may be different the one in the image below.[/font]
- [font=Tahoma, sans-serif]Create a new folder[/font]
- [font=Tahoma, sans-serif]Name this folder “Faceswap_project”, or something similar. Whatever you want to call it is okay. Pro-tip: use an underscore instead of spaces in all of your folder creation. This can sometimes be important for Faceswap to find the folders later.[/font]
- [font=Tahoma, sans-serif]Open your newly created folder.[/font]
- [font=Tahoma, sans-serif]Now create several new folders inside your newly created folder. We will create new folders named: Video_A, Video_B, Img_A, Img_B, Align_for_Conversion_A, Align_for_Training_A, Align_B, Align_for_Training_B, Model, Sort_A, Sort_B, and Backups. The suffixes A and B tell us if Data A or Data B is contained within the folder.[/font]
- [font=Tahoma, sans-serif]Don't worry too much about what all this means. You will begin to understand the reasons as you go through the process of DeepFake creation. After you have created your folder structure, it should be similar to the image below:[/font]
[font=Tahoma, sans-serif]The Backups folder will be for saving any files or folders that you might need later. It is a good idea to have backups of all your folders when using Faceswap. It is especially important to back up your Model folder after you have trained a model. We do these back ups as a fail safe measure in case something goes wrong or we make a mistake.[/font]
[font=Tahoma, sans-serif]Now we need some videos. We need a video for Data A, and a video for Data B. Remember that the Data A video will be the video we want to swap our face into. The Data B video will contain the face that will be doing the replacing. There are a number of ways to get your videos. These methods will not be covered in this guide.[/font]
[font=Tahoma, sans-serif]For ease of use, we need to break our extraction into two distinct categories. The first will be the face extractions that we will use for training the model. The second will be the face extractions that we will use during our conversion.[/font]
[font=Tahoma, sans-serif]We will begin by getting the face extractions for training . We will save this data for use later in the guide. In order to get our faces for our data, we first need to break our videos into single still-images, or “frames”. Open the Faceswap GUI as described in Part One of this guide. Near the top left corner of the GUI, we several tabs. Open the Effmpeg tab by clicking on it. Remember that you can hover your cursor over the different actions and selections, which will give you tooltips for more information about each setting. Let's get our frames for training:[/font]
- [font=Tahoma, sans-serif]Set the action setting to extract.[/font]
- [font=Tahoma, sans-serif]Set the input to the video you want your frames from. In this case, Video_A. Pro-tip: If you don't see your video available for selection, please ensure that “All Files” is selected in the bottom right of File Explorer.[/font]
- [font=Tahoma, sans-serif]Set the output to the folder in which you will store your frames. In this case, Img_A.[/font]
- [font=Tahoma, sans-serif]You can specifiy a reference video if you need to. If your input is a video, this is not necessary.[/font]
- [font=Tahoma, sans-serif]Leave everything else on default, and click the Effmpeg button towards the bottom left of the GUI.[/font]
- [font=Tahoma, sans-serif]Wait for the process to finish. This may take some time, and is dependent on the length of your video.[/font]
[font=Tahoma, sans-serif]We are done with getting our frames, and now we actually need to get the faces from them. Click the Extract tab near the top left of the GUI:[/font]
- [font=Tahoma, sans-serif]Select your input folder. In this case, Img_A. Pro-tip: There are two folder options for input. The first one to the left will extract faces directly from a video. The next one to the right will extract faces from frames. Choose the one best for your needs.[/font]
- [font=Tahoma, sans-serif]Select your output. In this case, Align_for_Training_A.[/font]
- [font=Tahoma, sans-serif]There are different options for the detector. Mtcnn is selected by default. We will use the default.[/font]
- [font=Tahoma, sans-serif]There are different options for the aligner. Fan is selected by default. We will use the default. [font=Tahoma, sans-serif][size=small]Update: The newer S3FD detector is now the preferred selection.[/font][/size][/font]
- [font=Tahoma, sans-serif]You can set Rotate Images, if you need to.[/font]
- [font=Tahoma, sans-serif]Use the slider to set Extract Every N to 10. We don't want to extract every face from every frame, as this can be hurtful to our data during training.[/font]
- [font=Tahoma, sans-serif]Check the Multiprocess checkbox under options, if you want. There is also a checkbox for an Align Eyes option. Success with Align Eyes varies.[/font]
- [font=Tahoma, sans-serif]Leave everything else on default.[/font]
- [font=Tahoma, sans-serif]Click the Extract button towards the bottom left of the GUI. [/font]
- [font=Tahoma, sans-serif]Wait for the process to finish. This may take some time.[/font]
[font=Tahoma, sans-serif]Before we begin training, we need to “clean” our data for both A and B. We need to discard any data in our Align folders that are not faces, are too blurry, are too dark, have obscured faces, etc. We only want clear unobstructed faces, mostly frontal views as much as possible. A basic rule of thumb is a face in which both eyes and eyebrows are visible. You can use some profile faces, but this is generally not recommended and should be kept to a minimum number.[/font]
[font=Tahoma, sans-serif]We will have to delete our bad faces manually, and this can be quite tedious and time-consuming. Fortunately, Faceswap has a Sort tab which will help us through this part of the process:[/font]
- [font=Tahoma, sans-serif]Select the input to be Align_for_Training_A.[/font]
- [font=Tahoma, sans-serif]Select the output to be Sort_A.[/font]
- [font=Tahoma, sans-serif]Select Final Process to be “Folders”.[/font]
- [font=Tahoma, sans-serif]Sort by and Group By should be left to “hist”.[/font]
- [font=Tahoma, sans-serif]Click the Sort button near the bottom left of the GUI.[/font]
- [font=Tahoma, sans-serif]Wait for the process to finish.[/font]
[font=Tahoma, sans-serif]Open your Sort A folder that you created earlier. Move the faces that you want to keep back into the Align_for_Training_A folder. When finished, delete or discard images that are left over in the Sort A folder. Now repeat this process for Sort B. Once this is complete, we can begin training our model.[/font]
[font=Tahoma, sans-serif]Step 2: TRAINING[/font]
[font=Tahoma, sans-serif]Start by configuring training plugins. Go to the Edit tab in the top left of the GUI, and select configure train plugins:[/font]
[font=Tahoma, sans-serif]There are two more tabs for train plugins, Global and Model. Global plugins will apply to all models. You can hover your cursor for tooltips and information for each option. We will select only Dssim Mask Loss in this case. Be aware that different configurations can have different effects on your training, and some will effect the amount of VRAM required to train the model that you choose.[/font]
Addendum March 11 2019: A seventh model has been added. This is the "Lightweight" model. This is for GPUs with less than 4 Gb of VRAM. It can train with only 1.6 Gb of VRAM (compensating for Windows 10) with a Batch Size of 8. Results will be less than spectacular, and your mileage may vary.
[font=Tahoma, sans-serif]We will also look at the Model tab:[/font]
[font=Tahoma, sans-serif]As you can see by the tabs, there are currently six different models to choose from. These are Dfaker, Dfl H128, Iae, Original, Unbalanced, and Villain. Some models have Lowmem checkboxes, but not all do. If you are using less than 4 Gb of VRAM, you should select the Lowmem box. The coverage slider for each model will adjust how much of the Data B face will appear. Most of the time, coverage of 70 to 75 will yield good results, but if this is your first time trying Faceswap, you may wish to leave everything on default for now. NOTE: Different models will have different requirements which may supersede the basic requirements for Faceswap. For example, the Unbalanced model needs a 6 Gb VRAM card or better, and the Villain model absolutely must have no less than 8 Gb of VRAM (and that is not using the model fully. Full Villain model will need over 9 Gb of VRAM.)[/font]
[font=Tahoma, sans-serif]Select the Original model, check the box for Dssim Loss, set Mask Type to None, and Coverage to 70. Press the OK button in the bottom right.[/font]
[font=Tahoma, sans-serif]Once back to the main screen of the GUI, we select the Train tab:[/font]
- [font=Tahoma, sans-serif]Set Input A to Align_for_Training_A[/font]
- [font=Tahoma, sans-serif]Set Input B to Align_for_Training_B[/font]
- [font=Tahoma, sans-serif]Set Model Dir to the Model folder you created earlier.[/font]
- [font=Tahoma, sans-serif]Set Trainer to original[/font]
- [font=Tahoma, sans-serif]Batch Size can be increased for higher VRAM cards. It can be decreased for lower VRAM cards. There are limits to how much you can gain from this. Batch sizes are always in powers of 2. We will use a Batch Size of 64. If you have a 4 Gb card, you may want to use a Batch Size of 32 or 16.[/font]
- [font=Tahoma, sans-serif]Set Preview Scale to 50.[/font]
- [font=Tahoma, sans-serif]Check the Preview checkbox towards the bottom of the GUI.[/font]
- [font=Tahoma, sans-serif]Click the Train button towards the bottom left of the GUI.[/font]
[font=Tahoma, sans-serif]Once you are satisfied with your trained model, and you have saved and finished the process, you are ready to begin swapping the faces.[/font]
[font=Tahoma, sans-serif]Part Three: Conversion and Frequently Asked Questions[/font]