MrDeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
Total Likes Received: 439 (0.74 per day | 9.46 percent of total 4641)
(Find All Threads Liked ForFind All Posts Liked For) Total Likes Given: 30 (0.05 per day | 0.64 percent of total 4717)
(Find All Liked ThreadsFind All Liked Posts)

(Moderator/Deepfake creator)

Registration Date: 03-17-2019
Date of Birth: Not Specified
Local Time: 10-30-2020 at 01:00 AM

tutsmybarreh's Most Liked Post
Post Subject Numbers of Likes
1.1 [SFW] [GUIDE] - DeepFaceLab 2.0 GUIDE (RECOMMENDED, UP TO DATE) 18
Thread Subject Forum Name
1.1 [SFW] [GUIDE] - DeepFaceLab 2.0 GUIDE (RECOMMENDED, UP TO DATE) Guides and Tutorials
Post Message
DeepFaceLab 2.0 Guide

[Image: You are not allowed to view links. Register or Login to view.]

This thread is just a guide - Don't ask for support here, only reports of bugs/bug fixing and guide update requests are allowed.

For more help, discussion on workflows, techniques, tips or different explanations please post in this thread, it also acts as a place for general discussion about DFL 2.0 where you can share you experiences: You are not allowed to view links. Register or Login to view.

Before you go on to ask a new question make sure to also read FAQ, detailed XSeg guide with it's FAQ and other threads in the guides section:
You are not allowed to view links. Register or Login to view.

DISCLAIMER: If you're planning on making your own guide based on mine please credit it by including link to it, donations are also welcome to support the work of maintaining and creating future guides for DFL and other machine learning related software.
Bitcoin (BTC): bc1q82g70nhhahym84dv4udl7f899rxjvzfjek30gd


GitHub page (newest version, updates, issues): You are not allowed to view links. Register or Login to view.
Stable releases: You are not allowed to view links. Register or Login to view.

If you don't have an NVIDIA GPU and your CPU doesn't let you train in a reasonable time or you don't want to use DFL 1.0 with your AMD GPU use Google Colab
You are not allowed to view links. Register or Login to view.

DFDNet - simple upscaling tool for fixing low resolution SRC datasets.
You are not allowed to view links. Register or Login to view.

SAEHD DFL 2.0 Spreadsheet for sharing model settings: You are not allowed to view links. Register or Login to view.
DFL 2.0  trained and pretrained models sharing thread: You are not allowed to view links. Register or Login to view.

Official DFL paper: You are not allowed to view links. Register or Login to view.

For more useful links as well as changelog and currently known issues/bugs, scroll down to the 2nd post in this thread.
FAQ is located in 3rd post.

What's the difference between 1.0 and 2.0? What's new in DFL 2.0?

At the core DFL 2.0 is very similar to 1.0 but it was rewritten and optimized to run much faster and offer better quality.
AMD cards are no longer supported and new models (based on SAEHD and Quick96) are incompatible with previous versions.
However datasets that have been extracted with later versions of DFL 1.0 can be still used in 2.0.

Main features and changes in 2.0:
  • 2 models: SAEHD and Quick 96
  • Support for multi-GPU setups
  • Increased performance during dataset extraction, training and merging thanks to better optimization compared to DFL 1.0
  • Faceset enhancer for enhancing detail of source dataset and upscaling merger output
  • GAN training for enhancing fine details of the trained faces
  • TrueFace - (for DF architectures) - makes results more SRC like
  • Ability to choose a device to use for each step
  • Merging process now also outputs mask images for post process work in external video editing software
  • Face landmarks embedded into dataset samples (faces)
  • Training preview
  • Interactive merger
  • Debug (landmarks preview) option for datasets
  • Dataset extraction using S3FD and manual mode
  • Training at any resolution in increments of 16 or 32 pixels
  • Multiple architectures (DF, LIAE, -U, -D and -UD variants)
  • XSeg masking model with dataset labels editor
DeepFaceLab 2.0 is compatible with NVIDIA GPUs and most CPUs. If you want to train on AMD GPUs - DFL 1.0 can do it but it's no longer supported.
DFL 2.0 requires Nvidia GPU with support of CUDA Compute Capability 3.0.

Explanation of all DFL functions:

DeepFaceLab 2.0 consists of several .bat files used to perform various tasks/steps of creating a deepfake, they are located in the main folder along with two subfolders:
  • _internal - internal files
  • workspace - this is where your models, videos, datasets and final video outputs are
Here is some terminology (folders are written with "quotations")

Dataset (faceset) - is a set of images that have been extracted (or aligned) from frames (extracted from video) or photos.

There are two datasets being used in DFL 2.0 and they are data_dst and data_src:

- "data_dst" is a folder that holds frames extracted from data_dst.mp4 file - that's the target video onto which we swap faces. It also contains 2 folders that are created after running face extraction from extracted frames:
"aligned" containing images of faces (with embedded facial landmarks data)
"aligned_debug" which contains original frames with landmarks overlaid on faces which is used to identify correctly/incorrectly aligned faces (and it doesn't take a part in training or merging process).
After cleaning up dataset (of false positives, incorrectly aligned faces and fixing them) it can be deleted to save space.

- "data_src" is a folder that holds frames extracted from data_src.mp4 file (that can be interview, movie, trailer, etc) or where you can place images of your source - basically the person whose face you want to swap on target video. As with data_dst extraction, after extracting faces from frames/pictures 2 folders are created:
"aligned" containing images of faces (with embedded facial landmarks data)
"aligned_debug" this folder by default is empty and doesn't contain any preview frames with landmarks like during extraction of data_dst, if you want these - you need to select yes (y) when starting extraction to confirm you want these generated to check if all faces are correctly extracted and aligned.

Before you get to extract faces however you must have something to extract them from:
- for data_dst you should prepare the target (destination) video and name it data_dst.mp4
- for data_src you should either prepare the source video (as in examples above) and name it data_src.mp4 or prepare images in jpg or png format.
The process of extracting frames from video is also called extraction so for the rest of the guide/tutorial I'll be referring to both processes as "face extraction" and "frame extraction".

As mentioned at the beginning all of that data is stored in the "workspace" folder, that's where both data_src/dst.mp4 files, both "data_src/dst" folders are (with extracted frames and "aligned"/"aligned_debug" folders for extracted/aligned faces) and the "model" folder where model files are stored.

1. Workspace cleanup/deletion:

1) Clear Workspace - deletes all data from the "workspace" folder, feel free to delete this .bat file to prevent accidental removal of the contentes of your workspace.

2. Frames extraction from source video (data_src.mp4):

2) Extract images from video data_src - extracts frames from data_src.mp4 video and puts them into automatically created "data_src" folder, available options:
- FPS - skip for videos default frame rate, enter numerical value for other frame rate (for example entering 5 will only render the video as it was 5 frames per second, meaning less frames will be extracted)
- JPG/PNG - choose the format of extracted frames, jpgs are smaller and generally have good enough quality so they are recommended, pngs are large and don't offer significantly higher quality but they are an option.

3. Video cutting (optional):

3) cut video (drop video on me) - allows to quickly cut any video to desired length by dropping it onto that .bat file. Useful if you don't have video editing software and want to quickly cut the video, options:
From time - start of the video
End time - end of the video
Audio track - leave at default
Bitrate - let's you change bitrate (quality) of the video - best to be left at default

3. Frames extraction from destination video (data_dst.mp4):

3) extract images from video data_dst FULL FPS - extracts frames from data_dst.mp4 video file and puts them into automatically created "data_dst" folder, available options:
- JPG/PNG - same as in 2)
4. Data_src faces extraction/alignment:

First stage of preparing source dataset is to align the landmarks and produce 512x512 face images from the extracted frames located inside "data_src" folder.

There are 2 options:
4) data_src faceset extract MANUAL - manual extractor, see 5.1 for usage.
4) data_src faceset extract
- automated extractor using S3FD algorithm

Available options for S3FD and MANUAL extractor are:
- choosing coverage area of extraction depending on face type of the model you want to train:
a) full face (for half, mid-half and full face)
b) whole face (for whole face but also works with others)
c) head (for head type of model)
- choosing which gpu (or cpu) to use for faces extraction/alignment process
- choosing whether to generate "aligned_debug" folder or not
4. Data_src cleanup:

After that is finished next step is to clean the source faceset/dataset of false positives/incorrectly aligned faces, for a detailed info check this thread: You are not allowed to view links. Register or Login to view.

4.1) data_src view aligned result - opens up external app that allows to quickly go through the contents of "data_src/aligned" folder for false positives and incorrectly aligned source faces as well as faces of other people so you can delete them.

4.2) data_src sort - contains various sorting algorithms to help you find unwanted faces, these are the available options:

[0] blur
[1] face yaw direction
[2] face pitch direction
[3] face rect size in source image
[4] histogram similarity
[5] histogram dissimilarity
[6] brightness
[7] hue
[8] amount of black pixels
[9] original filename
[10] one face in image
[11] absolute pixel difference
[12] best faces
[13] best faces faster

4.2) data_src util add landmarks debug images - let's you generate "aligned_debug" folder after extracting faces (if you wanted to have it but forgot or didn't select the right option in the first place.

4.2) data_src util faceset enhance - uses special machine learning algorithm to upscale/enhance the look of faces in your dataset, useful if your dataset is a bit blurry or you want to make a sharp one have even more detail/texture.

Optionally for enhancing SRC sets (not recommended for DST) you can use DFDNet - Colab link here:
You are not allowed to view links. Register or Login to view.

4.2) data_src util faceset metadata restore and 4.2) data_src util faceset metadata save - let's you save and restore embedded alignment data from your source faceset/dataset so you can edit some face images after you extracted them (for example sharpen them, edit out glasses, skin blemishes, color correct) without loosing alignment data.

4.2) data_src util faceset pack and 4.2) data_src util faceset unpack - packs/unpacks all faces from "aligned" folder into/from one file. Used mainly for preparing custom pretraining dataset or for easier sharing as one file.

4.2.other) data_src util recover original filename - reverts names of face images back to original order/filename (after sorting). Optional, training and merging will run correctly regardless of the SRC faces file names.
5. Data_dst preparation:

Here steps are pretty much the same as with source dataset, with few exceptions, let's start with faces extraction/alignment process.
We still have Manual and S3FD extraction method but there is also one that combines both and a special manual extraction mode, "aligned_debug" folder is generated always.

5) data_dst faceset extract MANUAL RE-EXTRACT DELETED ALIGNED_DEBUG - manual re-extraction from frames deleted from "aligned_debug" folder. More on that in 5. Data_dst cleanup. Usage below in step 5.1.
5) data_dst faceset extract MANUAL - manual extractor, see 5.1 for usage.
5) data_dst faceset extract + manual fix - automated + manual extractor for frames where algorithm couldn't properly detect faces.
5) data_dst faceset extract - automated extraction using S3FD algorithm.

Available options for all extractor modes are:

- choosing coverage area of extraction depending on face type of the model you want to train:
a) full face (for half, mid-half and full face)
b) whole face (for whole face but also works with others)
c) head (for head type of model)
- choosing which GPU (or CPU) to use for faces extraction/alignment process.

5.1 Manual extractor usage:

Upon starting the manual extractor or re-extractor a window will open up where you can manually locate faces you want to extract/re-extract:
- use your mouse to locate face
- use mouse wheel to change size of the search area
- make sure all or at least most landmarks (in some cases depending on the angle, lighting or present obstructions it might not be possible to precisely align all landmarks so just try to find a spot that covers all the visible bits the most and isn't too misaligned) land on important spots like eyes, mouth, nose, eyebrows and follow the face shape correctly, an up arrow shows you where is the "up" or "top" of the face
- use key A to change the precision mode, now landmarks won't "stick" so much to detected faces but you might be able to position landmarks more correctly
- user < and > keys (or , and .) to move back and forwards, to confirm a detection either left mouse click and move to the next one or hit enter
- right mouse button for detecting undetectable forward facing or non human faces (requires applying xseg for correct masking)
- q to skip remaining faces and quit extractor (it will also close down when you reach the last face and confirm it)

5.2 Data_dst cleanup:

After we aligned data_dst faces we have to clean them up, similar to how we did it with source faceset/dataset we have a selection of sorting methods which I'm not going to explain as they work exactly the same as ones for src.
However cleaning up the destination dataset is different than source because we want to have all the faces aligned for all the frames where they are present - including obstructed ones which we can mark in the XSeg editor and then train our XSeg model which will mask them out - effectively making obstructions clearly visible over the learned faces, more on that in the XSeg stage below. There are couple of tools at our disposal for that:

5.1) data_dst view aligned results - let's you view the contents of "aligned" folder using external app (built into DFL) which offers quicker thumbnail generation than default windows explorer
5.1) data_dst view aligned_debug results - let's you quickly browse contents of "aligned_debug" folder to locate and delete any frames where our target person face has incorrectly aligned landmarks or where landmarks weren't placed at all (which means face wasn't detected at all). In general you use this to find if all your faces are properly extracted and aligned (if landmarks on some frames aren't lining up with the shape of the face or eyes/nose/mouth/eyebrows or are missing - they should be deleted so we can later manually re-extract/align them).
5.2) data_dst sort - same as with source faceset/dataset, this tool let's you sort all aligned faces within "data_dst/aligned" folder so that's it's easier to locate incorrectly aligned faces, false positives and faces of other people we don't want to train our model on/swap faces onto.
5.2) data_dst util faceset pack and 5.2) data_dst util faceset unpack - same as with source, let's you quickly pack entire dataset into one file.
5.2) data_dst util recover original filename - same as with source, restores original names/order of all aligned faces after sorting.

Now that you know your tools here is an example of my technique for cleaning up the data_dst dataset that guarantees 100% of faces extraction.

1. Start by sorting data_dst using 5.2) data_dst sort and use sorting by histogram, this will sort faces by their similarity in color/structure so it's likely to group similar ones together and separate any images that may contain false positives, faces of other people and incorrectly aligned/extracted faces.
2. Delete all unwanted faces leaving only the good ones.
3. Revert filenames/order of faces using 5.2) data_dst util recover original filename.
4. Go into the folder "data_dst/aligned" and use the following powershell command to remove _0 suffixes from filenames of aligned faces.
Quote:- hold shift while right clicking, open powershell and use this command:

get-childitem *.jpg | foreach {rename-item $_ $"_0","")}

- wait for the folder address to be displayed again, indicating completion of the process and close the window
5. If your scene has crossfade transitions or mirrors, search for _1 files that may contain additional faces but also duplicates, move them to separate folder, run script again ("_1","")}, copy back to main folder and make sure to keep all files but not the same ones (so select just one to keep if it's the same face or both if they are different faces of the same person from the same frame).
6. Create a copy of the "aligned_debug" folder.
7. Once done, select all files from "aligned" and copy them (don't move them) to the "aligned_debug - copy" folder, replace, wait for it to finish and while all replaced files are still highlighted delete them.
8. Go through remaining frames and remove all that don't contain any faces you want to manually extract.
9. Copy the rest back to the original "aligned_debug" folder, replace, wait for it to finish and while all replaced files are still highlighted delete them.
10. Now your "aligned_debug" folder contains only frames from which faces were correctly extracted and all frames from which extractor failed to correctly extract faces or did not extract are gone which means you can run 5) data_dst faceset MANUAL RE-EXTRACT DELETED ALIGNED_DEBUG to manually extract them. Before you do that you might want to run 5.1) data_dst view aligned_debug results to quickly scroll through the remaining good ones and see if landmarks look correct on all of them, if you spot some less ideal looking landmarks, delete them so you can extract them manually too.

Now you are done cleaning up your data_dst and all faces are extracted correctly.

More tips, workflows, bug fixes and frequent issues are explained in the FAQ.
And in this thread there are some details on how to create source datasets, what to keep, what to delete and also in general how to clean source dataset (pretty much the same as destination dataset) and how/where to share them with other users: You are not allowed to view links. Register or Login to view.

5.3: XSeg model training and faceset marking.

XSeg is a replacement for defunct FANSeg model that is used to automatically mask your result face during merging. It is used to also to make obstructions visible over the swapped face and is completely customizable by the means of manual dataset marking and model training. There is no pretrained XSeg model (unlike FANseg model) which means you need to create your own XSeg model or use a shared one.

Such models can be also reused just like SAEHD and Quick96 models so when you start working on a new video you don't need to train new model from scratch but instead can reuse existing one by feeding it new marked faces. Both SRC and DST datsets can be marked thus giving you options of using XSeg-prd and XSeg-dst mask modes (respect SRC and DST face shapes respectively) or combine them in other ways (more on that in the merging part of the guide).

XSeg works with all face types so you have full control of which parts of the faces get swapped with new face and which parts (obstructions) are not.

New available .bat files/scripts are:

5.XSeg) data_dst mask for XSeg trainer - edit - label tool to mark destination faces with XSeg polygons.
5.XSeg) data_dst mask for XSeg trainer - fetch - copies faces containing XSeg polygons to folder "aligned_xseg". Can be used to collect labeled faces so they can be reused in next XSeg model training.
5.XSeg) data_dst mask for XSeg trainer - remove - removes labeled/marked XSeg polygons from the extracted frames.

5.XSeg) data_src mask for XSeg trainer - edit - same as above but for SRC dataset.
5.XSeg) data_src mask for XSeg trainer - fetch - same as above but for SRC dataset.
5.XSeg) data_src mask for XSeg trainer - remove - same as above but for SRC dataset.

XSeg) train.bat - runs the training of the XSeg model.

5.XSeg.optional) trained mask for data_dst - apply - replaces default DST masks derived from landmarks created during extraction with ones generated by the trained XSeg model, it is required for proper whole face and head face type model training and also if you plan on using style power with those 2 face types.
5.XSeg.optional) trained mask for data_dst - remove - removes XSeg masks and restores default DST masks.

5.XSeg.optional) trained mask for data_src - apply - same as above but for SRC dataset.
5.XSeg.optional) trained mask for data_src - remove - same as above but for SRC dataset.

Before you start it's important to know the difference between face marking/labeling and masking. Marks/labels are polygons you create manually in the editor which model uses to learn how to mask faces, mask are what gets applied by the apply .bat and also what merger will generate using your trained XSeg model during merging (XSeg model files are located in the model folder and must be present there during merging).


1. Mark your datasets

Start by marking both SRC* and DST faces using 5.XSeg) data_src mask for XSeg trainer - edit and 5.XSeg) data_dst mask for XSeg trainer - edit

Mark 50 to 100 different faces for both SRC and DST, you don't need to mark all faces but only those where the face looks significantly different, for example when facial expression changes, when direction/angle of the face changes or when lighting conditions/direction changes.

The more faces you mask, the better quality masks Xseg model will generate for you. Bigger and more varied datasets require more faces to be marked.
Use the same logic of marking for all faces.

Marking obstructions:

While marking faces you will also want to exclude obstructions so that they are visible in the final video, to do so you can either not include them in the main mark that defines face area you want to be swapped or use exclude poly mode to draw additional mark around the obstruction or part you want to not be swapped.

When marking obstructions you need to make sure you mark them on few faces according to the same rules as when marking faces with no obstructions, mark the obstructions  (even if it doesn't change appearance/shape/position) when face/head:
- changes angle
- facial expression
- or lighting conditions change

If the obstruction is changing shape and/or moving across the face you need to mark it few times, not all obstruction on all faces need to be marked but the more variety of different obstructions occur in various conditions the more faces you will have to mark.

Hands, hair, etc should be marked by following the edge however if you want you can create a slight offset by moving the polygons lines away from the obstruction, this is especially important if you're gonna be excluding tongues or the mouth cavity as DFL can struggle with those so it's usually better if you exclude them, while doing so make sure you keep plenty of distance from teeth, otherwise if you place the polygon lines on the edges of teeth when you later blur the mask in the merger it will show both SRC and DST teeth which will look bad. Preferably you should only mark out the mouth cavity when DST mouth is wide open and tongue only when it is sticking out of the mouth. Marking those on SRC faces is optional.

The way you mark your faces however is entirely up to you.

Once you finish marking DST faces scroll to the end of the list and hit Esc to save them, then you can move on to training your model.

*If you're training a full face model you may skip marking of SRC faces however if you want to use XSeg-prd or use Style Powers during training it is a must and for WF or HEAD you must mark and apply both.

2. Train your XSeg model.

When starting training for the first time you will see an option to select face_type of the XSeg model which should be set to be the same as face_type of your face swapping model although you can also use a higher option too.

You will also be able to choose device to train on as well as batch size.

You can switch preview modes using space (there are 3 modes, DST training, SRC training and SRC+DST (distorted), to update preview window press P and to save and stop training use Esc key.

During training check previews often, if some faces have bad masks with holes, save and stop training, run editor again, find faces that have wrong masks by enabling XSeg overlay view, mark them, scroll to the end of the list, esc to save and exit and resume XSeg model training, when starting up an already trained model you will get a prompt if you want to restart training, select no (n) as selecting yes (y) will start model training from 0. However in case your masks are not improving and are still full of holes despite having marked many more faces and being well above 100k-150k iterations it might be necessary to mark even more faces or restart training and possibly use different batch size values (higher or lower depending on how many marked faces you're feeding to the model).

3. Apply XSeg masks to your datasets.

After you're done training or after you've already applied XSeg once and then fixed faces that had bad masks it's time for final application of XSeg masks to your datasets. Also as I've already explained it is not necessary to apply datasets if you're doing a full face face swap, you can simply use XSeg during merging by selecting the new masking modes.

XSeg editor:

[Image: You are not allowed to view links. Register or Login to view.]

Training preview:

[Image: You are not allowed to view links. Register or Login to view.]

For more detailed guide with FAQ and shared XSeg models and marked face read the full XSeg guide here: Check this thread for more detailed XSeg guide:

You are not allowed to view links. Register or Login to view.

6. Training:

There are currently 2 models to choose from for training:

SAEHD (6GB+): High Definition Styled Auto Encoder - for high end GPUs with at least 6GB of VRAM.

Features/settings available:
- runs at any resolution in increments of 16 (32 for -UD and -D variants) up to 640x640 pixels
- half face, mid-half face, full face, whole face and head face type
- 8 architectures: DF, LIAE, each in 4 variants - regular, -U, -D and -UD
- Adjustable Batch Size
- Adjustable Model Auto Encoder, Encoder, Decoder and Mask Decoder Dimensions
- Auto Backup feature
- Preview History
- Adjustable Target Iteration
- Random Flip (yaw)
- Uniform Yaw
- Eye Priority
- Masked Training
- GPU Optimizer
- Learning Dropout
- Random Warp
- GAN Training Power
- True Face Training Power
- Face and Background Style Power
- Color Transfer modes
- Gradient Clipping
- Pretrain Mode

Quick96 (2-4GB): Simple model derived from SAE model - dedicated for low end GPUs with 2-4GB of VRAM.

- 96x96 Pixels resolution
- Full Face
- Batch size 4
- DF-UD architecture

Both models can generate good deepfakes but obviously SAEHD is the preferred and more powerful one. Quick96 is recommended for low end cards or dataset testing
If you want to see what other people can achieve with various graphics cards, check this spreadsheet out where users can share their model settings:
You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.
After you've checked other peoples settings and decided on a model you want to use you start it up using either one of those:

6) train SAEHD
6) train Quick96

Since Quick96 is not adjustable you will see the command window pop up and ask only 1 question - CPU or GPU (if you have more then it will let you choose either one of them or train with both).
SAEHD however will present you with more options to adjust.

In both cases first a command line window will appear where you input your model settings. On a first start will you will have access to all setting that are explained below, on startup of training with a model already trained and present in the "model" folder you will also receive a prompt where you can choose which model to train on (if you have more than one set of model files present in your "model" folder).
You will also always get a prompt to select which GPU or CPU you want to run the trainer on.

Second thing you will see once you startup is the preview window that looks like this:

[Image: You are not allowed to view links. Register or Login to view.]

Here is a more detailed explanation of all functions in order they are presented to the user upon starting training of a new model:

Note that some of these get locked and can't be changed once you start training due to way these models work, example of things that can't be changed later are:
- model resolution
- model architecture
- models dimensions (dims settings)
- face type

Autobackup every N hour ( 0..24 ?:help ) : self explanatory - let's you enable automatic backups of your model every N hours. Leaving it at 0 (default) will disable auto backups. Default value is 0 (disabled).

Target iteration : will stop training after certain amount of iterations is reached, for example if you want to train you model to only 100.000 iterations you should enter a value of 100000. Leaving it at 0 will make it run until you stop it manually. Default value is 0 (disabled).

Flip faces randomly ( y/n ?:help ) : Useful option in cases where you don't have all necessary angles of the persons face (source dataset) that you want to swap onto the target. For example if you have a target/destination video with person looking straight and to the right and your source only has faces looking straight and to the left you should enable this feature but bear in mind that because no face is symmetrical results may look less like src and also features on the source face (like beauty marks, scars, moles, etc.) will be mirrored. Default value is n (disabled).

Batch_size ( ?:help ) : Batch size settings affects how many faces are being compared to each other every each iteration. Lowest value is 2 and you can go as high as your GPU will allow which is affected by VRAM. The higher your models resolution, dimensions and the more features you enable the more VRAM will be needed so lower batch size might be required. It's recommended to not use value below 4. Higher batch size will provide better quality at the cost of slower training (higher iteration time). For the intiall stage it can be set lower value to speed up initial training and then raised higher. Optimal values are between 6-12. How to guess what batch size to use? You can either use trial and error or help yourself by taking a look at what other people can achieve on their GPUs by checking out the DFL 2.0 spreadsheet: You are not allowed to view links. Register or Login to view.[url=]

Resolution ( 64-640 ?:help ) : here you set your models resolution, bear in mind this option cannot be changed during training. It affects the resolution of swapped faces, the higher model resolution - the more detailed the learned face will be but also training will be much heavier and longer. Resolution can be increased from 64x64 to 640x640 by increments of:
16 (for regular and -U architectures variants)
32 (for -D and -UD architectures variants)
Higher resolutions might require increasing of the model dimensions (dims).

Face type ( h/mf/f/wf/head ?:help ) : this option let's you set the area of the face you want to train, there are 5 options - half face, mid-half face, full face, whole face and head:
a) Half face - only trains from mouth to eybrows but can in some cases cut of top or bottom of the face (eyebrows, chin, bit of mouth).
b) Mid-half face - aims to fix this issue by covering 30% larger portion of face compared to half face which should prevent most of the undesirable cut offs from occurring but they can still happen.
c) Full face - covers most of the face area, excluding forehead, can sometimes cut off a little bit of chin but this happens very rarely - most recommended when SRC and/or DST have hair covering forehead.
d) Whole face - expands that area even more to cover pretty much the whole face, including forehead and even a little bit of hair but this mode should be used when we want to make a swap of the entire face, excluding hair. Additional option for this face type is masked_training that let's you prioritize learning full face area of face first and then (after disabling) letting the model learn the rest of the face like forehead.
e) Head - is used to do a swap of the entire head, not suitable for subjects with long hair, works best if the source faceset/dataset comes from single source and both SRC and DST have short hair or one that doesn't change shape depending on the angle. Minimum recommended resolution for this face type is 224.
[Image: You are not allowed to view links. Register or Login to view.]

Example of whole face type face swap:

[Image: You are not allowed to view links. Register or Login to view.]

Example of head type face swap:

You are not allowed to view links. Register or Login to view.
AE architecture (df/liae/df-u/liae-u/df-d/liae-d/df-ud/liae-ud ?:help ) : This option let's you choose between 2 main learning architectures DF and LIAE as well as their -U, -D and -UD versions.

DF and LIAE architectures are the base ones, both offering good quality with decent performance.
DF-U, DF-UD, LIAE-U and LIAE- UD are additional architecture variants.

DF: This model architecture provides a more direct face swap, doesn't morph faces but requires that the source and target/destination face/head have similar face shape.
This model works best on frontal shots and requires that your source dataset has all the required angles, can produce worse results on side profiles.

LIAE: This model architecture isn't as strict when it comes to face/head shape similarity between source and target/destination but this model does morph the faces so it's recommended to have actual face features (eyes, nose, mouth, overall face structure) similar between source and target/destination. This model offers worse resemblance to source on frontal shots but can handle side profiles much better and is more forgiving when it comes to source faceset/dataset, often producing more refined face swaps with better color/lighting match.

-U: this variant aims to improve similarity/likeness of trained result face to SRC dataset.
-D: this variant aims to improve performance, it let's you train your model at twice the resolution with no extra compute cost (VRAM usage) and similar performance, for example train 256 resolution model at the same VRAM usage and speed (iteration time) as 128 resolution model. However it requires longer training, model must be pretrained first for optimal results and resolution must be changed by the value of 32 as opposed to 16 in other variants.

[/b]combines both variants for maximum likeness and increased resolution/performance. Also requires longer training and model to be pretrained.

The next 4 options control models neural network dimensions which affect models ability to learn, modifying these can have big impact on performance and quality of the learned faces so they should be left at default.

AutoEncoder dimensions ( 32-1024 ?:help ) : Auto encoder dimensions settings, affects overall ability of the model to learn faces.
Encoder dimensions ( 16-256 ?:help ) : Encoder dimensions settings, affects ability of the model to learn general structure of the faces.
Decoder dimensions ( 16-256 ?:help ) : Decoder dimensions settings, affects ability of the model to learn fine detail.
Decoder mask dimensions ( 16-256 ?:help ) : Mask decoder dimensions settings, affects quality of the learned masks. May or may not affect some other aspects of training.
Since now learned mask is enabled always by default and can't be changed one may consider dropping this setting down to lower value to get better performance but detailed tests would have to be done to determine effects on quality of masks, learned faces and performance to determine if it's worth to change is from default value.

The changes in performance when changing each setting can have varying effects on performance and it's not possible to measure effect of each one on performance and quality without extensive training. Each one is set at certain default value that should offer optimal results and good compromise between training speed and quality.

Also when changing one parameter the other ones should be changed as well to keep the relations between them similar (for example if you drop Encoder and Decoder dimensions from 64 to 48 you could also decrease AutoEncoder dimension from 256 to 192-240). Feel free to experiment with various settings.
If you want optimal results, keep them at default or increase them slightly for higher resolution models.
Eyes priority ( y/n ?:help ) : Attempts to fix problems with eye training by forcing the neural network to train eyes with higher priority.
Bear in mind that it does not guarantee the right eye direction, it only affects the details of the eyes and area around them. Example (before and after):
[Image: You are not allowed to view links. Register or Login to view.]

Place models and optimizer on GPU ( y/n ?:help ) : Enabling GPU optimizer puts all the load on your GPU which greatly improves performance (iteration time) but will lead to higher VRAM usage, disabling this feature will offload some work of the optimizer to CPU which decreases load on GPU and VRAM usage thus letting you achieve higher batch size or run more demanding models at the cost of longer iteration times. If you get OOM (out of memory) error and you don't want to lower your batch size or disable some feature you should disable this feature and thus some work will be offloaded to your CPU and some data from GPUs VRAM to system RAM - you will be able to run your model without OOM errors at the cost of lower speed. Default value is y (enabled).

Use learning rate dropout ( y/n/cpu ?:help ) : LRD is designed to aid in training by changing learning rate dropout when enabled and thus accelerating training of faces (lower loss) and also reducing sub-pixel shake. LRD must be enabled before running GAN. This option affects VRAM usage so if you run into OOM errors you can run it on CPU which will decrease VRAM usage and let you train at the same batch size but iteration time will slowed down by about 20%. For more detailed explanation of LRD and order of enabling main features during training please refer to FAQ Question 8 below this guide:
"When should I enable or disable random warp, GAN, True Face, Style Power, Color Transfer and Learning Rate Dropout?".

Enable random warp of samples ( y/n ?:help ) : Random warp is used to generalize a model so that it correctly learns all the basic shapes, face features, structure of the face, expressions and so on but as long as it's enabled the model may have trouble learning the fine detail - because of it it's recommended to keep this feature enabled as long as your faces are still improving (by looking at decreasing loss values and preview window), once the face are trained fully and you want to get some more detail you should disable it and in few thousand iterations you should start to see more detail and with this feature disabled you carry on with training. Default value is y (enabled).

Uniform_yaw ( y/n ?:help ) : Helps with training of profile faces, forces model to train evenly on all faces depending on their yaw and prioritizes profile faces, may cause frontal faces to train slower, enabled by default during pretraining, can be used similarly to random warp (at the beginning of the training process) or enabled after RW is disabled when faces are more or less trained and you want profile faces to look better and less blurry. Useful when your source dataset doesn't have many profile shots. Can help lower loss values. Default value is n (disabled).

GAN power ( 0.0 .. 10.0 ?:help ) : GAN stands for Generative Adversarial Network and in case of DFL 2.0 it is implemented as an additional way of training to get more detailed/sharp faces. This option is adjustable on a scale from 0.0 to 10.0 and it should only be enabled once the model is more or less done training (after you've disabled random warp of samples and enabled LRD). It's recommended to start at low value of 0.1 which is also a recommended value in most cases, once it's enabled you should not disable it, make sure to make backups of your models in case you don't like the results.
Default value is 0.0 (disabled).

Before/after example of a face trained with GAN at value of 0.1 for 40k iterations:

[Image: You are not allowed to view links. Register or Login to view.]

'True face' power. ( 0.0000 .. 1.0 ?:help ) : True face training with a variable power settings let's you set the model discriminator to higher or lower value, what this does is it tries to make the final face look more like src, as with GAN this feature should only be enabled once random warp is disabled and model is fairly well trained. Consider making a backup before enabling this feature. Never use high values, typical value is 0.01 but you can use even lower ones like 0.001. The higher the setting the more result face will look like faces in source dataset which may cause issues with color match and also cause artifacts to show up so it's important to not use high values. It has a small performance impact which may cause OOM error to occur. Default value is 0.0 (disabled).

[Image: You are not allowed to view links. Register or Login to view.]

Face style power ( 0.0..100.0 ?:help ) and Background style power ( 0.0..100.0 ?:help ) : This setting controls style transfer of either face or background part of the image, it is used to transfer the style of your target/destination faces (data_dst) over to the final learned face which can improve quality and look of the final result after merging but high values can cause learned face to look more like data_dst than data_src. It will transfer some color/lighting information from DST to result face. It's recommended to not use values higher than 10. Start with small values like 0.001-0.01. This feature has big performance impact and using it will increase iteration time and may require you to lower your batch sizedisable gpu optimizer or run LRD on CPU. Consider making a backup before enabling this feature. Default value is 0.0 (disabled).

Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : this features is used to match the colors of your data_src to the data_dst so that the final result has similar skin color/tone to the data_dst and the final result after training doesn't change colors when face moves around (which may happen if various face angles were taken from various sources that contained different light conditions or were color graded differently). There are several options to choose from:

- rct (reinhard color transfer): based on: You are not allowed to view links. Register or Login to view.
- lct (linear color transfer): Matches the color distribution of the target image to that of the source image using a linear transform.
- mkl (Monge-Kantorovitch linear): based on: You are not allowed to view links. Register or Login to view.
- idt (Iterative Distribution Transfer): based on: You are not allowed to view links. Register or Login to view.
- sot (sliced optimal transfer): based on: You are not allowed to view links. Register or Login to view.

Enable gradient clipping ( y/n ?:help ) : This feature is implemented to prevent so called model collapse/corruption which may occur when using various features of DFL 2.0. It has small performance impact so if you really don't want to use it you must enable auto backups as a collapsed model cannot recover and must be scraped and training must be started all over. Default value is n (disabled) but since the performance impact is so low and it can save you a lot of time by preventing model collapse if you leave it enabled. Model collapse is most likely to happen when using Style Powers so if you're using them it's highly advised to enable gradient clipping or backups (you can also do them manually).

Enable pretraining mode ( y/n ?:help ) : Enables pretraining process that uses a dataset of random peoples faces to initially train your model, after training it to around 200k-400k iterations such model can be then used when starting training with actual data_src and data_dst you want to train, it saves time because you don't have to start training all over from 0 every time (the model will "know" how faces should look like and thus speed up the initial training stage). The pretrain option can be enabled at any time but it's recommended to pretrain a model only once at the start. You can also pretrain with your own custom faceset, all you need to do is create one (can be either data_src or data_dst) and then use 4.2) data_src (or dst) util faceset pack .bat file to pack into into one file, then rename it to faceset.pak and replace (backup old one) the file inside the "...\_internal\pretrain_CelebA" folder. Default value is n (disabled). However if you want to save some time you can use one of the shared pretrained models in this thread.

Shared models: You are not allowed to view links. Register or Login to view.

To use shared pretrained model simply download it, put all the files directly into your model folder, start training, press any key within 2 seconds after selecting model for training (if you have more than on in the model folder) and device to train with (GPU/CPU) to override model settings and make sure the pretrain option is disabled so that you start proper training, if you leave pretrain options enabled the model will carry on with pretraining. Note that the model will revert iteration count to 0, that's normal behavior for a pretrained model.

Optionally instead of pretraining you can just train a model on random faces placed in your data_src and data_dst but this method can cause morphing to previously trained data when you start regular training which might cause result face to look less src like. This is the same behavior that might occur when you reuse an already trained model on new SRC and DST datasets.

7. Merging:

After you're done training your model it's time to merge learned face over original frames to form final video (convert).

For that we have 2 converters corresponding to 2 available models:

7) merge SAEHD
7) merge Quick96

Upon selecting any of those a command line window will appear with several prompts.

1st one will ask you if you want to use an interactive converter, default value is y (enabled) and it's recommended to use it over the regular one because it has all the features and also an interactive preview where you see the effects of all changes you make when changing various options and enabling/disabling various features
Use interactive merger? ( y/n ) :

2nd one will ask you which model you want to use:
Choose one of saved models, or enter a name to create a new model.
[r] : rename
[d] : delete
[0] : df192 - latest

3rd one will ask you which GPU/GPUs or CPU you want to use for the merging (conversion) process:
Choose one or several GPU idxs (separated by comma).
[0] : GeForce GTX 1070 8GB
[0] Which GPU indexes to choose? :

Pressing enter will use default value (0).

After that's done you will see a command line window with current settings as well as preview window which shows all the controls needed to operate the interactive converter/merger.

Here is a quick look at both the command line window and converter preview window:
[Image: You are not allowed to view links. Register or Login to view.]

Converter features many options that you can use to change the mask type, it's size, feathering/blur, you can add additional color transfer and sharpen/enhance final trained face even further.

Here is the list of all merger/converter features explained:

1. Main overlay modes:
- original: displays original frame without swapped face
- overlay: simple overlays learned face over the frame
- hist-match: overlays the learned face and tires to match it based on histogram (has 2 modes: normal and masked that can be switched with Z)
- seamless: uses opencv poisson seamless clone function to blend new learned face over the head in the original frame
- seamless hist match: combines both hist-match and seamless.
- raw-rgb: overlays raw learned face without any masking

NOTE: Seamless modes can cause flickering.

2. Hist match threshold: controls strength of the histogram matching in hist-match and seamless hist-match overlay mode.
Q - increases value
A - decreases value

3. Erode mask: controls the size of a mask.
W - increases mask erosion (smaller mask)
S - decreases mask erosion (bigger mask)

4. Blur mask: blurs/feathers the edge of the mask for smoother transition
E - increases blur
D - decreases blur

5. Motion blur: after entering initial parameters (converter mode, model, GPU/CPU) merger loads all frames and data_dst aligned data, while it's doing it, it calculates motion vectors that are being used to create effect of motion blur which this setting controls, it let's you add it in places where face moves around but high values may blur the face even with small movement. The option only works if one set of faces is present in the "data_dst/aligned" folder - if during cleanup you had some faces with _1 prefixes (even if only faces of one person are present) the effect won't work, same goes if there is a mirror that reflects target persons face, in such case you cannot use motion blur and the only way to add it is to train each set of faces separately.
R - increases motion blur
F - decreases motion blur

6. Super resolution: uses similar algorithm as data_src dataset/faceset enhancer, it can add some more definitions to areas such as teeth, eyes and enhance detail/texture of the learned face.
T - increases the enhancement effect
G - decreases the enhancement effect

7. Blur/sharpen: blurs or sharpens the learned face using box or gaussian method.
Y - sharpens the face
H - blurs the face
N - box/gaussian mode switch

8. Face scale: scales learned face to be larger or smaller.
U - scales learned face down
J - scales learned face up

9. Mask modes: there are 6 masking modes:

dst: uses masks derived from the shape of the landmarks generated during data_dst faceset/dataset extraction.
learned-prd: uses masks learned during training. Keep shape of SRC faces.
learned-dst: uses masks learned during training. Keep shape of DST faces.
learned-prd*dst: combines both masks, smaller size of both.
learned-prd+dst: combines both masks, bigger size of both.
XSeg-prd: uses XSeg model to mask using data from source faces.
XSeg-dst: uses XSeg model to mask using data from destination faces.
XSeg-prd*dst: combines both masks, smaller size of both.
learned-prd*dst*XSeg-dst*prd: combines all 4 mask modes, smaller size of all.

10. Color transfer modes: similar to color transfer during training, you can use this feature to better match skin color of the learned face to the original frame for more seamless and realistic face swap. There are 8 different modes:

11. Image degrade modes: there are 3 settings that you can use to affect the look of the original frame (without affecting the swapped face):
Denoise - denoises image making it slightly blurry (I - increases effect, K - decrease effect)
Bicubic - blurs the image using bicubic method (O - increases effect, L - decrease effect)
Color - decreases color bit depth (P - increases effect, ; - decrease effect)

Additional controls:

TAB button - switch between main preview window and help screen.
Bear in mind you can only change parameters in the main preview window, pressing any other buttons on the help screen won't change them.
-/_ and =/+ buttons are used to scale the preview window.
Use caps lock to change the increment from 1 to 10 (affects all numerical values).

To save/override settings for all next frames from current one press shift + / key.
To save/override settings for all previous frames from current one press shift + M key.
To start merging of all frames press shift + > key.
To go back to the 1st frame press shift + < key.
To only convert next frame press > key.
To go back 1 frame press < key.

8. Conversion of frames back into video:

After you merged/convert all the faces and you will have a folder named "merged" inside your "data_dst" folder containing all frames that makeup the video.
Last step is to convert them back into video and combine with original audio track from data_dst.mp4 file.

To do so you will use one of 4 provided .bat files that will use FFMPEG to combine all the frames into a video in one of the following formats - avi, mp4, loseless mp4 or loseless mov:
- 8) merged to avi
- 8) merged to mov lossless
- 8) merged to mp4 lossless
- 8) merged to mp4

And that's it! After you've done all these steps you should have a file called (avi/mp4/mov) which is your deepfake video.

tutsmybarreh's Forum Info
Last Visit:
Total Posts:
1,455 (2.45 posts per day | 7.28 percent of total posts) Find All Posts
Total Threads:
70 (0.12 threads per day | 1.78 percent of total threads) Find All Threads
Time Spent Online:
Given: 30 | Recieved: 439
Members Referred:
tutsmybarreh's Contact Details
Additional Info About tutsmybarreh
DeepFake Portfolio:
tutsmybarreh's Signature
If I helped you in any way or you enjoy my deepfakes, please consider a small donation.
Bitcoin: 1C3dq9zF2DhXKeu969EYmP9UTvHobKKNKF
Want to request a paid deepfake or have any questions reagarding the forums or deepfake creation using DeepFaceLab? Write me a message.
TMB-DF on the main website - You are not allowed to view links. Register or Login to view.