MrDeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
   
  • 1(current)
  • 2
  • 3
  • 4
  • 5
  • ...
  • 15
  • Next 
tutsmybarreh[SFW] [GUIDE] - DeepFaceLab 2.0 EXPLAINED AND TUTORIALS (recommended)
#1
DeepFaceLab 2.0 Guide/Tutorial

[Image: eWsS3rBh.jpg]

NOTE: This thread is meant to be just a guide.

The only types of posts/questions allowed here are ones about the guide itself - suggestions on what to add to it and what's not clear.
If you have a question about techniques/workflows, want to complain or just talk in general and share something about the process of using DFL 2.0 please post in this thread:
You are not allowed to view links. Register or Login to view.

What is DeepFaceLab 2.0?

DeepFaceLab 2.0 is a tool/app utilizing machine learning to swap faces in videos.

What's the difference between 1.0 and 2.0? What's new in DFL 2.0?

At the core DFL 2.0 is very similar to 1.0 but it was rewritten and optimized to run much faster and offer better quality at the cost of compatibility.
Because of it, AMD cards are no longer supported and new models (based on SAE/SAEHD and Quick96) are incompatible with previous versions. However any datasets that have been extracted with later versions of DFL 1.0 can be still used in 2.0.

SAEHD DFL 2.0 Spreadsheet with users model settings: You are not allowed to view links. Register or Login to view.
DFL 2.0  trained and pretrained models sharing thread: You are not allowed to view links. Register or Login to view.

"Official" DFL paper: You are not allowed to view links. Register or Login to view.

He is a list of main features and changes in 2.0:
  • Includes 2 models: SAEHD and Quick 96.
  • Support for multi-GPU configurations.
  • Increased performance during faceset (dataset) extraction, training and merging thanks to better optimization (compared to DFL 1.0)
  • Faceset enhancer - for upscaling/enhancing detail of source faceset (dataset).
  • GAN training - uses Generative Adversarial Network to enhance fine details of the face.
  • TrueFace - (for DF architectures) - makes result face more src like.
  • Ability to choose which GPU to use for each step (extraction, training, merging).
  • Ability to quickly rename, delete and create new models within the command line window.
  • Merging process now also outputs mask images for post process work in external video editing software.
  • Face landmark/position data embedded within dataset/faceset image files with option to extract embedded info for dataset modifications.
  • Training preview window.
  • Interactive converter.
  • Debug (face landmark preview) option for source and destination (data_src/dst) datasets.
  • Faceset (dataset) extraction with S3FD and/or manual extraction.
  • Training at any resolution in increments of 16. Possibility of training models at resolutions up to 640x640.
  • Multiple architectures (DF, LIAE, -U, -D and -UD variants)
DeepFaceLab 2.0 is compatible with NVIDIA GPUs and CPUs, no AMD support anymore, if you want to train on AMD GPUs - DFL 1.0 can do it but it's no longer supported/updated.
DFL 2.0 requires Nvidia GPUs that support at least CUDA Compute Compability version 3.0
CUDA Compute Capability list: You are not allowed to view links. Register or Login to view.

DOWNLOAD:

GitHub page (contains newest version as well as all current updates): You are not allowed to view links. Register or Login to view.
Stable releases can be found here: You are not allowed to view links. Register or Login to view.

If you don't have an NVIDIA GPU and your CPU doesn't let you train in any reasonable time or you don't want to use DFL 1.0 with your AMD GPU consider using Google Colab: You are not allowed to view links. Register or Login to view.

Explanation of all DFL functions:

DeepFaceLab 2.0 consists of selection of .bat files used to extract, train and merge (previously called convert) which are 3 main steps of creating a deepfake, they are located in the main folder along with two subfolders:
  • _internal (that's where all the files necessary for DFLs to work are)
  • workspace (this is where your models, videos, facesets (datasets) and final video outputs are
Before we go into the main guide part here is some terminology (folders are written with "quotations")

Faceset (dataset) - is a set of images that have been extracted (or aligned with landmarks) from frames (extracted from video) or photos.

There are two datasets being used in DFL 2.0 and they are data_dst and data_src:

- data_dst is a folder that holds frames extracted from data_dst.mp4 file - that's the target video onto which we swap faces. It also contains 2 folders that are created after running face extraction from extracted frames:
"aligned" containing images of faces, 512x512 in size (with embedded facial landmarks data)
"aligned_debug" which contains original frames with landmarks overlayed on faces which is used to identify correctly/incorrectly aligned faces (and it doesn't take a part in training or merging process).
After cleaning up dataset (of false positives, incorrectly aligned faces and fixing them) it can be deleted to save space.

- data_src is a folder that holds frames extracted from data_src.mp4 file (that can be interview, movie, trailer, etc) or where you can place images of your source - basically the person whose face you want to swap on target video. As with data_dst extraction, after extracting faces from frames/pictures 2 folders are created:
"aligned" containing images of faces, 512x512 in size (with embedded facial landmarks data)
"aligned_debug" this folder by default is empty and doesn't contain any preview frames with landmarks like during extraction of data_dst, if you want these - you need to select yes (y) when starting extraction to confirm you want these generated to check if all faces are correctly extracted and aligned.

Before you get to extract faces however you must have something to extract them from:
- for data_dst you should prepare the target (destination) video and name it data_dst.mp4
- for data_src you should either prepare the source video (as in examples above) and name it data_src.mp4 or prepare images in jpg or png format.
The process of extracting frames from video is also called extraction so for the rest of the guide/tutorial I'll be referring to both processes as "face extraction" and "frame extraction".

As mentioned at the beginning all of that data is stored in the "workspace" folder, that's where both data_src/dst.mp4 files, both "data_src/dst" folders are (with extracted frames and "aligned"/"aligned_debug" folders for extracted/aligned faces) and the "model" folder where model files are stored.

Options are grouped based on the function they do.

1. Workspace cleanup/deletion:

1) Clear Workspace - self explanatory, it deletes all data from the "workspace" folder, feel free to delete this .bat file to prevent accidental removal of important files you will be storing in the "workspace" folder

2. Frames extraction from source video (data_src.mp4):

2) Extract images from video data_src - extracts frames from data_src.mp4 video file and puts them into automatically created "data_src" folder, available options:
- FPS - skip for videos default frame rate, enter numerical value for other frame rate (for example entering 5 will only render the video as it was 5 frames per second, meaning less frames will be extracted)
- JPG/PNG - choose the format of extracted frames, jpgs are smaller and generally have good enough quality so they are recommended, pngs are large and don't offer significantly higher quality but they are an option.

3. Video cutting (optional):

3) cut video (drop video on me) - allows to quickly cut any video to desired length by dropping it onto that .bat file. Useful if you don't have video editing software and want to quickly cut the video, options:
From time - start of the video
End time - end of the video
Audio track - leave at default
Bitrate - let's you change bitrate (quality) of the video - also best to leave at default

3. Frames extraction from destination video (data_dst.mp4):

3) extract images from video data_dst FULL FPS - extracts frames from data_dst.mp4 video file and puts them into automatically created "data_dst" folder, available options:
- JPG/PNG - same as in 2).

4. Data_src faces extraction/alignment:

First stage of preparing source dataset is to align the landmarks and produce 512x512 face images from the extracted frames located inside "data_src" folder.

There are 2 options:
4) data_src faceset extract MANUAL - manual extraction
4) data_src faceset extract
- automated extraction using S3FD algorithm

Available options for both S3FD and MANUAL extractor are:
- choosing coverage area of extraction depending on face type of the model you want to train:
a) full face (for half, mid-half and full face)
b) whole face (for whole face but also works with others)
c) head (for head type of model)

- choosing which gpu (or cpu) to use for faces extraction/alignment process.
- choosing whether to generate "aligned_debug" folder or not.

4. Data_src cleanup:

After that is finished next step is to clean the source faceset/dataset of false positives/incorrectly aligned faces, for a detailed info check this thread: You are not allowed to view links. Register or Login to view.

4.1) data_src view aligned result - opens up external app that allows to quickly go through the contents of "data_src/aligned" folder for false positives and incorrectly aligned source faces as well as faces of other people so you can delete them.

4.2) data_src sort - contains various sorting algorithms to help you find unwanted faces, these are the available options:

[0] blur
[1] face yaw direction
[2] face pitch direction
[3] face rect size in source image
[4] histogram similarity
[5] histogram dissimilarity
[6] brightness
[7] hue
[8] amount of black pixels
[9] original filename
[10] one face in image
[11] absolute pixel difference
[12] best faces
[13] best faces faster

4.2) data_src util add landmarks debug images - let's you generate "aligned_debug" folder after extracting faces (if you wanted to have it but forgot or didn't select the right option in the first place.

4.2) data_src util faceset enhance - uses special machine learning algorithm to upscale/enhance the look of faces in your dataset, useful if your dataset is a bit blurry or you want to make a sharp one have even more detail/texture.

4.2) data_src util faceset metadata restore and 4.2) data_src util faceset metadata save - let's you save and later restore embedded alignment data from your source faceset/dataset so you can edit some face images after you extracted them (for example sharpen them, edit out glasses, skin blemishes, color correct) without loosing alignment data and also so you don't need to re-extract them again.

EDITING ANY IMAGES FROM "ALIGNED" FOLDER WITHOUT THIS STEP WILL REMOVE THAT ALIGNMENT DATA AND THOSE PICTURES WON'T BE USABLE IN TRAINING, WHEN EDITING KEEP THE NAMES THE SAME, NO FLIPPING/ROTATION IS ALLOWED, ONLY SIMPLE EDITS LIKE COLOR CORRECTION, ETC.

4.2) data_src util faceset pack and 4.2) data_src util faceset unpack - packs/unpacks all faces from "aligned" folder into/from one file.

4.2.other) data_src util recover original filename - reverts names of face images back to original order/filename (after sorting).

5. Data_dst preparation:

Here steps are pretty much the same as with source dataset, with few exceptions, let's start with faces extraction/alignment process.
We still have Manual and S3FD extraction method but there is also one that combines both and a special manual extraction mode, "aligned_debug" folder is generated always.

5) data_dst faceset extract MANUAL RE-EXTRACT DELETED ALIGNED_DEBUG - manually extraction from frames deleted from "aligned_debug" folder. More on that in next step - Data_dst cleanup.
5) data_dst faceset extract MANUAL - manual extraction.
5) data_dst faceset extract + manual fix - automated extraction + manual one for frames where algorithm couldn't properly detect faces.
5) data_dst faceset extract - automated extraction using S3FD algorithm.

Available options for all extractor modes are:

- choosing coverage area of extraction depending on face type of the model you want to train:
a) full face (for half, mid-half and full face)
b) whole face (for whole face but also works with others)
c) head (for head type of model)

- choosing which GPU (or CPU) to use for faces extraction/alignment process.

5. Data_dst cleanup:

After we aligned data_dst faces we have to clean them up, similar to how we did it with source faceset/dataset we have a selection of sorting methods which I'm not going to explain as they work exactly the same as ones for src.
However cleaning up the destination dataset is different than source because we want to have all the faces aligned for all the frames where they are present - including obstructed ones which we can mark in the XSeg editor and then train our XSeg model which will mask them out - effectively making obstructions clearly visible over the learned faces, more on that in the XSeg stage below. There are couple of tools at our disposal for that:

5.1) data_dst view aligned results - let's you view the contents of "aligned" folder using external app (built into DFL) which offers quicker thumbnail generation than default windows explorer
5.1) data_dst view aligned_debug results - let's you quickly browse contents of "aligned_debug" folder to locate and delete any frames where our target person face has incorrectly aligned landmarks or where landmarks weren't placed at all (which means face wasn't detected at all). In general you use this to find if all your faces are properly extracted and aligned (if landmarks on some frames aren't lining up with the shape of the face or eyes/nose/mouth/eyebrows or are missing - they should be deleted so we can later manually re-extract/align them).
5.2) data_dst sort - same as with source faceset/dataset, this tool let's you sort all aligned faces within "data_dst/aligned" folder so that's it's easier to locate incorrectly aligned faces, false positives and faces of other people we don't want to train our model on/swap faces onto.
5.2) data_dst util faceset pack and 5.2) data_dst util faceset unpack - same as with source, let's you quickly pack entire dataset into one file.
5.2) data_dst util recover original filename - same as with source, restores original names/order of all aligned faces after sorting.

Now that you know your tools here is an example of my technique/workflow for cleaning up the data_dst dataset that guarantees 100% of faces extraction.

1. Start by sorting data_dst using 5.2) data_dst sort and select sorting by histogram, this will generally sort faces by their similarity in color/structure so it's likely to group similar ones together and separate any images that may contain rotated/zoomed in/out faces, as well as false positives and faces of other people.
2. Delete all false positives, incorrectly aligned (rotated) and unwanted faces.
3. Revert the filenames/order to original one using 5.2) data_dst util recover original filename.
4. Go into the folder "data_dst/aligned" and use the following powershell command to remove _0 suffixes from filenames of aligned data_dst faces.

Quote:- hold shift while right clicking, open powershell and use this command:

get-childitem *.jpg | foreach {rename-item $_ $_.name.replace("_0","")}

- wait for the folder address to be displayed again, indicating completion of the process and close the window
5. If your scene has crossfade transitions or mirrors, search for _1 files that may contain additional faces but also duplicates, move them to separate folder, run script again ("_1","")}, copy back to main folder and make sure to keep all files but not the same ones (so select just one to keep if it's the same face or both if they are different faces of the same person from the same frame).
6. Create a copy of the "aligned_debug" folder.
7. Once done, select all files from "aligned" and copy them (don't move them) to the "aligned_debug - copy" folder, hit replace, wait for it to finish and while all replaced files are still highlighted/selected delete them.
8. Go through remaining frames and remove all that don't contain any faces you want to manually extract.
9. Copy the rest back to the original "aligned_debug" folder, hit replace, wait for it to finish and while all replaced files are still highlight/selected delete them.
10. Now your "aligned_debug" folder contains only frames from which faces were correctly extracted and all frames from which extractor failed to correctly extract faces or did not extract them are gone which means you can run 5) data_dst faceset MANUAL RE-EXTRACT DELETED ALIGNED_DEBUG to manually extract them. Before you do that you might want to run 5.1) data_dst view aligned_debug results to quickly scroll through the remaining "good" ones and see if landmarks look correct on all of them, if you spot some less ideal looking landmarks, feel free to delete them so you can extract them manually too.
10. Now you are done cleaning up your data_dst and all faces are extracted correctly.

More tips, workflows, bug fixes and frequent issues are explained in the FAQ:
You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.

And in this thread there are some details on how to create source datasets, what to keep, what to delete and also in general how to clean source dataset (pretty much the same was as target/destination dataset) and how/where to share them with other users: You are not allowed to view links. Register or Login to view.

5.1: XSeg model training and faceset marking.

With XSeg model you can train your own mask segmentator of dst(and src) faces that will be used during merging to mask result faces and mask out obstructions as well as specify face and background area for style powers to work with whole face and head face type models.
There is no pretrained XSeg model (unlike the old FANseg model) so you need to create your own that will control how the result face is masked over your DST/target video during merging. XSeg is designed to let you create masks that best suit your datasets.
Such models can be also reused so when you start working on a new video you don't need to train the model from scratch but instead reuse existing one and instead feed it new marked faces. Both src and dst facesets/datasets can be marked thus giving you options of using XSeg-prd (src) and XSeg-dst and also combine them both and even combine both with learned mask, just like FANSeg (FAN-dst, FAN-prd, FAN-dst+prd, FAN-dst+prd+learned)
XSeg works with all face types such as full_face, whole_face and even head so you have full control of the face coverage and can also include/exclude hair and obstructions.
Workflow is fairly straightforward, it's the best solution for obtaining good quality masks without the need of using video editing software and creating masks manually via means of rotoscoping using software like Mocha Pro or built-in solutions in various video editing software.

New available .bat files/scripts are:

5.XSeg) data_dst mask for XSeg trainer - edit - label tool to define masked areas of the destination faces with XSeg polygons.
5.XSeg) data_dst mask for XSeg trainer - fetch - copies faces containing XSeg polygons to folder "aligned_xseg". Can be used to collect labeled faces so they can be reused in future XSeg model training.
5.XSeg) data_dst mask for XSeg trainer - remove - removes labeled XSeg polygons from the extracted frames.
5.XSeg) data_src mask for XSeg trainer - edit - label tool to define masked areas of the source faces with XSeg polygons.
5.XSeg) data_src mask for XSeg trainer - fetch - copies faces containing XSeg polygons to folder "aligned_xseg". Can be used to collect labeled faces so they can be reused in future XSeg model training.
5.XSeg) data_src mask for XSeg trainer - remove - removes labeled XSeg polygons from the extracted frames.
XSeg) train.bat - runs the training of the XSeg model.
5.XSeg.optional) trained mask for data_dst - apply - replaces default full_face mask derived from landmarks created during extraction with one from trained XSeg model, only needed if you plan on using face/background style power with obstructed faces.
5.XSeg.optional) trained mask for data_dst - remove - removes it
5.XSeg.optional) trained mask for data_src - apply - replaces default full_face mask derived from landmarks created during extraction with one from trained XSeg model, only needed if you plan on using face/background style power with obstructed faces.
5.XSeg.optional) trained mask for data_src - remove - removes it

Usage:

1. Train your model - self explanatory. Nothing changes about this stage, just run your model of any face_type you want to use. The exception being using style powers with whole face or head face type models - basically if you want to be able to use this feature during training of the face swapping model (Quick 96, SAEHD) you need to train your XSeg model first and then apply trained masks (SRC and DST) to both datasets, otherwise this feature will not work, reason for this is that in order for it to work it must know where face and background is, the default masks used when training are derived from landmarks created during faces extraction and they only cover full face area - hence they can't be used with whole face and head face type models.

2. 5.XSeg) data_dst mask for XSeg trainer - edit - label/mask DST faces.

Read tooltips/description on the buttons to know their function (en/ru/zn languages are supported) and mask faces using include and/or exclude polygon mode.

Mask 100 to 200 different DST faces, you don't need to mask faces for every frame of dst but only those where the face looks significantly different, for example:

- has closed eyes/different facial expression
- face/head has changed direction/angle
- lighting condition/direction is different

The more various faces you mask, the better quality masks Xseg model will generate for you.
Start masking from the upper left area and follow the clockwise direction and keep the same logic of masking for all frames, for example:

- the same approximated jaw line of the side faces, where the jaw is not visible
- the same hair line

Once you finish marking (masking) DST faces hit Esc to save them, then you can move on to training your model if you are only gonna use XSeg-DST.

If you want to also use XSeg-PRD in the merger you need to run mark/mask SRC faces using SRC labeling tool: 5.XSeg) data_src mask for XSeg trainer - edit

2.1 Masking obstructions

While masking faces you will also probably want to mask out/exclude obstructions so that they are visible in the final video above the result face, to do so you can either:
- not include obstructions in the main mask that defines face area and other bits you want to be swapped.
- use exclude poly mode to draw additional mask around the obstruction.

When masking obstructions you need to make sure you mark them in several frames according to the same rules as when masking faces with no obstructions, mark the obstruction (even if it doesn't change appearance/shape/position when face/head:
- changes angle
- facial expression changes
- lighting conditions change

If the obstruction is additionally changing shape and/or moving across the face you need to mark it few times, not all obstruction in every frame need to be marked though but still the more variety of different obstructions occur in various conditions - the more frames you will have to mark.

3. Run XSeg) train.bat - Train the XSeg model.

- When starting training for the first time you will see an option to select face type of the model to be used with (such as full face, whole face, head), select one that matches your trained model and start training.
- You can switch preview modes using space (there are 3 modes, DST training, SRC training and SRC+DST (distorted).
- To update preview progress press P.
- Esc to save and stop training.


During training check preview often, if some faces have wrong or glitchy/broken masks, run edit again and find those faces that cause issues and mask them, then resume training or start it again from scratch.

Restarting the training of XSeg model is only possible by deleting all 'model\XSeg_*' files. You can also move them to other folder.

New mask modes available in merger for whole_face:
XSeg-prd  - XSeg mask of predicted face - faces from src faceset should be labeled/masked.
XSeg-dst  - XSeg mask of dst face - faces from dst faceset should be labeled/masked.
XSeg-prd*XSeg-dst - the smallest area of both XSeg-prd and XSeg-dst masks - both src and dst faceset must be labeled/masked.

If workspace\model folder contains trained XSeg model, then merger will use it, otherwise you will get transparent mask by using XSeg-* mask modes.

Screenshots:


- editor

[Image: 7Bk4RRV.jpg]

- trainer

[Image: NM1Kn3s.jpg]

- merger

[Image: glUzFQ8.jpg]

6. Training:

There are currently 2 models to choose from for training:

SAEHD (6GB+): High Definition Styled AutoEncoder model - high end model for high end GPUs with at least 6GB of VRAM.

Features/settings available:
- runs at any resolution in increments of 16 (32 for -UD and -D variants) up to 640x640 pixels
- features half face, mid-half face, full face, whole face and head face type
- 8 architectures: DF, LIAE, each in 4 variants - regular, -U, -D and -UD
- Adjustable Batch Size
- Adjustable Model Auto Encoder, Encoder, Decoder and Mask Decoder Dimensions
- Adjustable Auto Backup
- Togglable Preview History
- Adjustable Target Iteration
- Togglable Random Flip (yaw)
- Togglable Uniform Yaw
- Togglable Eye Priority
- Togglable Masked Training
- Adjustable GPU Optimizer
- Adjustable Learning Dropout
- Togglable Random Warp
- Adjustable GAN Training Power
- Adjustable True Face Training Power
- Adjustable Face and Background Style Power
- Adjustable Color Transfer
- Togglable Gradient Clipping
- Togglable Pretrain Mode

Quick96 (2-4GB): Simple model derived from SAE model - dedicated for low end GPUs with 2-4GB of VRAM.

Features:
- runs at 96x96 pixels resolutions
- full face mode
- batch size 4

Both models can generate good deepfakes but obviously SAEHD is the preferred and more powerful one.
If you want to test out your ideas Quick96 isn't a bad idea but of course you can still run SAEHD at the same setting or go even lower.
If you want to see what other people can achieve with various graphics cards, check this spreadsheet out where users can share their model settings:
You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.
After you've checked other peoples settings and decided on a model you want to use you start it up using either one of those:

6) train SAEHD
6) train Quick96

Since Quick96 is not adjustable you will see the command window pop up and ask only 1 question - CPU or GPU (if you have more then it will let you choose either one of them or train with both).
SAEHD however will present you with more options to adjust.

In both cases first a command line window will appear where you input your model settings. On a first start will you will have access to all setting that are explained below, on startup of training with a model already trained and present in the "model" folder you will also receive a prompt where you can choose which model to train on (if you have more than one set of model files present in your "model" folder).
You will also always get a prompt to select which GPU or CPU you want to run the trainer on.

Second thing you will see once you startup is the preview window that looks like this:

[Image: aNEoiAN.jpg]

Here is a more detailed explanation of all functions in order they are presented to the user upon starting training of a new model:

Note that some of these get locked and can't be changed once you start training due to way these models work, example of things that can't be changed later are:
- model resolution
- model architecture
- models dimensions (dims settings)
- face type


Autobackup every N hour ( 0..24 ?:help ) : self explanatory - let's you enable automatic backups of your model every N hours. Leaving it at 0 (default) will disable auto backups. Default value is 0 (disabled).

Target iteration : will stop training after certain amount of iterations is reached, for example if you want to train you model to only 100.000 iterations you should enter a value of 100000. Leaving it at 0 will make it run until you stop it manually. Default value is 0 (disabled).

Flip faces randomly ( y/n ?:help ) : Useful option in cases where you don't have all necessary angles of the persons face (source dataset) that you want to swap onto the target. For example if you have a target/destination video with person looking straight and to the right and your source only has faces looking straight and to the left you should enable this feature but bear in mind that because no face is symmetrical results may look less like src and also features on the source face (like beauty marks, scars, moles, etc.) will be mirrored. Default value is n (disabled).

Batch_size ( ?:help ) : Batch size settings affects how many faces are being compared to each other every each iteration. Lowest value is 2 and you can go as high as your GPU will allow which is affected by VRAM. The higher your models resolution, dimensions and the more features you enable the more VRAM will be needed and thus lower batch size will be possible. It's recommended to not use value below 4. Higher batch size will provide better quality at the cost of slower training (higher iteration time). For the intial stage it can be set lower to speed up generalization of the face and then increased once face is sufficiently trained.
How to guess what batch size to use? You can either use trial and error or help yourself by taking a look at what other people can achieve on their GPUs by checking out the DFL 2.0 spreadsheet: You are not allowed to view links. Register or Login to view.[url=https://mrdeepfakes.com/forums/thread-dfl-2-0-user-model-settings-spreadsheet]https://mrdeepfakes.com/forums/thread-dfl-2-0-user-model-settings-spreadsheet

Resolution ( 64-640 ?:help ) : here you set your models resolution, bear in mind this option cannot be changed during training. It affects the resolution of swapped faces, the higher model resolution - the more detailed the learned face will be but also training will be much heavier and longer. Resolution can be increased from 64x64 to 640x640 by increments of:
16 (for regular and -U architectures variants)
32 (for -D and -UD architectures variants)

Face type ( h/mf/f/wf/head ?:help ) : this option let's you set the area of the face you want to train, there are 5 options - half face, mid-half face, full face, whole face and head:
a) Half face - only trains from mouth to eybrows but can in some cases cut of top or bottom of the face (eyebrows, chin, bit of mouth).

b) Mid-half face - aims to fix this issue by covering 30% larger portion of face compared to half face which should prevent most of the undesirable cut offs from occurring but they can still happen.
c) Full face - covers most of the face area, excluding forehead, can sometimes cut off a little bit of chin but this happens very rarely - most recommended when SRC and/or DST have hair covering forehead.
d) Whole face - expands that area even more to cover pretty much the whole face, including forehead and even a little bit of hair but this mode should be used when we want to make a swap of the entire face, excluding hair. Additional option for this face type is masked_training that let's you prioritize learning full face area of face first and then (after disabling) letting the model learn the rest of the face like forehead.
e) Head - is used to do a swap of the entire head, not suitable for subjects with long hair, works best if the source faceset/dataset comes from single source and both SRC and DST have short hair or one that doesn't change shape depending on the angle. Minimum recommended resolution for this face type is 224.

[Image: UNDadcN.jpg]

Example of whole face type face swap:

[Image: ldlVgZH.png]

Example of head type face swap:

You are not allowed to view links. Register or Login to view.
AE architecture (df/liae/df-u/liae-u/df-d/liae-d/df-ud/liae-ud ?:help ) : This option let's you choose between 2 main learning architectures DF and LIAE as well as their -U, -D and -UD versions.

DF and LIAE architectures are the base ones, both offering good quality with decent performance.

DF-U, [b]DF-UD, [/b]LIAE-U and LIAE- UD are additional architecture variants.

DF: This model architecture provides a more direct face swap, doesn't morph faces but requires that the source and target/destination face/head have similar face shape.
This model works best on frontal shots and requires that your source dataset has all the required angles, can produce worse results on side profiles.

LIAE: This model architecture isn't as strict when it comes to face/head shape similarity between source and target/destination but this model does morph the faces so it's recommended to have actual face features (eyes, nose, mouth, overall face structure) similar between source and target/destination. This model offers worse resemblance to source on frontal shots but can handle side profiles much better and is more forgiving when it comes to source faceset/dataset, often producing more refined face swaps with better color/lighting match.

-U: this variant aims to improve similarity/likeness of trained result face to SRC dataset.
[b]-D: [/b]this variant aims to improve performance, it let's you train your model at twice the resolution with no extra computional cost (VRAM usage) and similar performance, for example train 256 resolution model at the same VRAM usage and speed (iteration time) as 128 resolution model. However it requires longer training, model must be pretrained first for optimal results and resolution must be changed by the value of 32 as opposed to 16 in other variants.

[b]-UD:
[/b]combines both variants for maximum likness and increased resolution/performance. Also requires longer training and model to be pretrained.

The next 4 options control models neural network dimensions which affect models ability to learn, modifying these can have big impact on performance and quality of the learned faces so they should be left at default.

AutoEncoder dimensions ( 32-1024 ?:help ) : Auto encoder dimensions settings, affects overall ability of the model to learn faces.
Encoder dimensions ( 16-256 ?:help ) : Encoder dimensions settings, affects ability of the model to learn general structure of the faces.
Decoder dimensions ( 16-256 ?:help ) : Decoder dimensions settings, affects ability of the model to learn fine detail.
Decoder mask dimensions ( 16-256 ?:help ) : Mask decoder dimensions settings, affects quality of the learned masks. May or may not affect some other aspects of training.
Since now learned mask is enabled always by default and can't be changed one may consider dropping this setting down to lower value to get better performance but detailed tests would have to be done to determine effects on quality of masks, learned faces and performance to determine if it's worth to change is from default value.

The changes in performance when changing each setting can have varying effects on performance and it's not possible to measure effect of each one on performance and quality without extensive training. Each one is set at certain default value that should offer optimal results and good compromise between training speed and quality.

Also when changing one parameter the other ones should be changed as well to keep the relations between them similar (for example if you drop Encoder and Decoder dimensions from 64 to 48 you could also decrease AutoEncoder dimension from 256 to 192-240). Feel free to experiment with various settings.
If you want optimal results, keep them at default or increase them slightly for higher resolution models.
Eyes priority ( y/n ?:help ) : Attempts to fix problems with eye training especially on HD architecture variants like DFHD and LIAEHD by forcing the neural network to train eyes with higher priority.
Bear in mind that it does not guarantee the right eye direction, it only affects the details of the eyes and area around them. Example (before and after):
[Image: YQHOuSR.jpg]

Place models and optimizer on GPU ( y/n ?:help ) : Enabling GPU optimizer puts all the load on your GPU which greatly improves performance (iteration time) but will lead to lower batch size, disabling this feature (False) will offload some work (optimizer) to CPU which decreases load on GPU (and VRAM usage) letting you achieve slightly higher batch size or run more taxing models (higher resolution or model dimensions) at the cost of training speed (longer iteration time).
Basically if you get OOM (out of memory) errors you should disable this feature and thus some work will be offloaded to your CPU and some data from GPUs VRAM to system RAM - you will be able to run your model without OOM errors and/or at higher batch size but at the cost of lowered performance. Default value is y (enabled).

Use learning rate dropout ( y/n/cpu ?:help ) : This setting helps to increase training speed, make faces sharper and reduce subpixel shake/jittering of faces. This option should be enabled before disabling random warp and before enabling GAN. To read more about how learning rate dropout works, google it, it's a general concept in machine learning but because of this it requires knowledge in this field to fully understand. It can be used with other options enabled like TrueFace, Style Power, Color Transfer, etc.
This option affects VRAM usage slightly so if you run into OOM errors you can run it on CPU which will decrease VRAM usage and let you train at the same batch size but iteration time will slow down by about 20%.

Enable random warp of samples ( y/n ?:help ) : Random warp of samples is a feature that used to be enabled all the time in the old SAE models of DFL 1.0 but now is optional, it's used to generalize a model so that it properly learns all the basic shapes, face features, structure of the face, expressions and so on but as long as it's enabled the model may have trouble learning the fine detail - because of it it's recommended to keep this feature enabled as long as your faces are still improving (by looking at decreasing loss values and preview window), once the face are trained fully and you want to get some more detail you should disable it and in few hundred-thousand iterations you should start to see more detail and with this feature disabled you carry on with training. Default value is y (enabled).

Uniform_yaw ( y/n ?:help ) : Helps with training of profile faces, forces model to train evenly on all faces depending on their yaw and prioritizes profile faces, may cause frontal faces to a bit train slower, enabled by default during pretraining, can be used similarly to random warp (at the beginning of the training process) or enabled later when faces are more or less trained and you want profile faces to look better and less blurry. Useful when your source dataset doesn't have many profile shots.

GAN power ( 0.0 .. 10.0 ?:help ) : GAN stands for Generative Adversarial Network and in case of DFL 2.0 it is implemented as an additional way of training on your datasets to get more detailed/sharp faces. This option is adjustable on a scale from 0.0 to 10.0 and it should only be enabled once the model is more or less done training (after you've disabled random warp of samples). It's recommended to start at low value of 0.1 which is also a recommended value in most cases, once it's enabled you should not disable it, make sure to make backups of your models in case you don't like the results.
Default value is 0.0 (disabled).

Before/after example of a face trained with GAN at value of 0.1 for 40k iterations:

[Image: Nbh3mw1.png]

'True face' power. ( 0.0000 .. 1.0 ?:help ) : True face training with a variable power settings let's you set the model discriminator to higher or lower value, what this does is it tries to make the final face look more like src, as with GAN this feature should only be enabled once random warp is disabled and model is fairly well trained. Consider making a backup before enabling this feature. Default value is 0.0 (disabled). Never use value of 1.0, typical value is 0.01 but you can use even lower one of 0.001-0.004. The higher the setting the more result face will look like faces in source dataset which may cause issues with color match and also cause artifacts to show up so it's important to not use values above 0.1-0.2. It has a small performance impact which may cause OOM error to occur.

[Image: czScS9q.png]

Face style power ( 0.0..100.0 ?:help ) and Background style power ( 0.0..100.0 ?:help ) : This variable setting controls style transfer of either face or background part of the image, it is used to transfer the style of your target/destination faces (data_dst) over to the final learned face which can improve quality and look of the final result after merging but high values can cause learned face to look more like data_dst than data_src. It basically will transfer some color/lighting information from DST to result face.
It's recommended to not use values higher than 10. Start with small values like 0.001-0.01.
This feature has big performance impact and using it will increase iteration time and may require you to lower your batch size or disable gpu optimizer (Place models and optimizer on GPU). Consider making a backup before enabling this feature.

Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : this features is used to match the colors of your data_src to the data_dst so that the final result has similar skin color/tone to the data_dst and the final result after training doesn't change colors when face moves around (which may happen if various face angles were taken from various sources that contained different light conditions or were color graded differently). There are several options to choose from:

- rct (reinhard color transfer): based on: You are not allowed to view links. Register or Login to view.
- lct (linear color transfer): Matches the color distribution of the target image to that of the source image using a linear transform.
- mkl (Monge-Kantorovitch linear): based on: You are not allowed to view links. Register or Login to view.
- idt (Iterative Distribution Transfer): based on: You are not allowed to view links. Register or Login to view.
- sot (sliced optimal transfer): based on: You are not allowed to view links. Register or Login to view.

Enable gradient clipping ( y/n ?:help ) : This feature is implemented to prevent so called model collapsing/corruption which may occur when using various features of DFL 2.0. It has small performance impact so if you really don't want to use it you must enable auto backups as a collapsed model cannot recover and must be scraped and training must be started all over. Default value is n (disabled) but since the performance impact is so low and it can save you a lot of time by preventing model collapse I recommend enabling it always on all models.

Enable pretraining mode ( y/n ?:help ) : Enables pretraining process that uses a dataset of random peoples faces to initially train your model, after training it to around 200k-400k iterations such model can be then resused when starting training with proper data_src and data_dst you want to train, it saves time because you don't have to start training all over from 0 every time (the model will "know" how faces should look like and thus speed up the initial training stage) and it's recommended to either pretrain a model using this feature or by grabbing a pretrained model from our forum:
You are not allowed to view links. Register or Login to view.
Default value is n (disabled).
NOTE: The pretrain option can be enabled at any time but it's recommended to pretrain a model only once at the start (to around 200-400k iterations).
NOTE 2: You can also pretrain with your own custom faceset, all you need to do is create one (can be either data_src or data_dst) and then use 4.2) data_src (or dst) util faceset pack .bat file to pack into into one file, then rename it to faceset.pak and replace (backup old one) the file inside the "...\_internal\pretrain_CelebA" folder.

You can also just train a model on random faces in both data_src and data_dst but this method can cause morphing and as a result once you use a model prepared in such a way for a long time after starting training on proper datasets you trained faces may be morphing to old learned data and as result look less like SRC and more like faces it was trained on before or DST.

To use shared pretrained model simply download it and put all the files directly into your model folder, start training, press any key within 2 seconds after selecting model for training (if you have more in the folder) and device to train with (GPU/CPU) to override model settings and make sure the pretrain option is disabled so that you start proper training, if you leave pretrain options enabled the model will carry on with pretraining. Note that the model will revert iteration count to 0, that's normal behavior for pretrained model, unlike a model that was just trained on random faces without the use of pretrain function.

Shared models: You are not allowed to view links. Register or Login to view.

7. Merging:

After you're done training your model it's time to merge learned face over original frames to form final video (convert).

For that we have 2 converters corresponding to 2 available models:

7) merge SAEHD
7) merge Quick96

Upon selecting any of those a command line window will appear with several prompts.

1st one will ask you if you want to use an interactive converter, default value is y (enabled) and it's recommended to use it over the regular one because it has all the features and also an interactive preview where you see the effects of all changes you make when changing various options and enabling/disabling various features
Use interactive merger? ( y/n ) :

2nd one will ask you which model you want to use:
Choose one of saved models, or enter a name to create a new model.
[r] : rename
[d] : delete
[0] : df160 - latest
:

3rd one will ask you which GPU/GPUs or CPU you want to use for the merging (conversion) process:
Choose one or several GPU idxs (separated by comma).
[CPU] : CPU
[0] : GeForce GTX 1060 6GB
[0] Which GPU indexes to choose? :

Pressing enter will use default value (0).

After that's done you will see a command line window with current settings as well as preview window which shows all the controls needed to operate the interactive converter/merger.

Here is a quick look at both the command line window and converter preview window:
[Image: BT6vAzW.png]

Converter features many options that you can use to change the mask type, it's size, feathering/blur, you can add additional color transfer and sharpen/enhance final trained face even further.

Here is the list of all merger/converter features explained:

1. Main overlay modes:
- original: displays original frame without swapped face
- overlay: simple overlays learned face over the frame
- hist-match: overlays the learned face and tires to match it based on histogram (has 2 modes: normal and masked hist match, toggable with Z button)
- seamless: uses opencv poisson seamless clone function to blend new learned face over the head in the original frame
- seamless hist match: combines both hist-match and seamless.
- raw-rgb: overlays raw learned face without any masking

NOTE: Seamless modes can cause flickering, it's recommended to use overlay.

2. Hist match threshold: controls strength of the histogram matching in hist-match and seamless hist-match overlay mode.
Q - increases value
A - decreases value


3. Erode mask: controls the size of a mask.
W - increases mask erosion (smaller mask)
S - decreases mask erosion (bigger mask)


4. Blur mask: blurs/feathers the edge of the mask for smoother transition
E - increases blur
D - decreases blur


5. Motion blur: upon entering initial parameters (interactive converter, model, GPU/CPU) merger/converter loads all frames and data_dst aligned data, while it's doing it, it calculates motion vectors that are being used to create effect of motion blur which this setting controls, it let's you add it in places where face moves around but high values may blur the face even with small movement. The option only works if on set of faces is present in the "data_dst/aligned" folder - if during cleanup you had some faces with _1 prefixes (even if only faces of one person are present) the effect won't work, same goes if there is a mirror that reflects target persons face, in such case you cannot use motion blur and the only way to add it is to train each set of faces separately.
R - increases motion blur
F - decreases motion blur


6. Super resolution: uses similar algorithm as data_src dataset/faceset enhancer, it can add some more definitions to areas such as teeth, eyes and enhance detail/texture of the learned face.
T - increases the enhancement effect
G - decreases the enhancement effect


7. Blur/sharpen: blurs or sharpens the learned face using box or gaussian method.
Y - sharpens the face
H - blurs the face
N - box/gaussian mode switch


8. Face scale: scales learned face to be larger or smaller.
U - scales learned face down
J - scales learned face up

9. Mask modes: there are 6 masking modes:

dst: uses masks derived from the shape of the landmarks generated during data_dst faceset/dataset extraction.
learned-prd: uses masks learned during training. Keep shape of source faces.
learned-dst: uses masks learned during training. Keep shape of source faces.
learned-prd*dst: combines both masks, smaller size of both.
learned-prd+dst: combines both masks, bigger size of both.
XSeg-prd: uses trained XSeg model to mask using data from source faces.
XSeg-dst: uses trained XSeg model to mask using data from destination faces.
XSeg-prd*dst: combines both masks, smaller size of both.
learned-prd*dst*XSeg-dst*prd: combines all 4 mask modes, smaller size of all.

You can also mask it manually in post:
You are not allowed to view links. Register or Login to view.

10. Color transfer modes: similar to color transfer during training, you can use this feature to better match skin color of the learned face to the original frame for more seamless and realistic face swap. There are 8 different modes:
RCT
LCT
MKL
MKL-M
IDT
IDT-M
SOT - M
MIX-M


11. Image degrade modes: there are 3 settings that you can use to affect the look of the original frame (without affecting the swapped face):
Denoise - denoises image making it slightly blurry (I - increases effect, K - decrease effect)
Bicubic - blurs the image using bicubic method (O - increases effect, L - decrease effect)
Color - decreases color bit depth (P - increases effect, ; - decrease effect)

Additional controls:

TAB button - switch between main preview window and help screen.
Bear in mind you can only change parameters in the main preview window, pressing any other buttons on the help screen won't change them.
-/_ and =/+ buttons are used to scale the preview window.
Use caps lock to change the increment from 1 to 10 (affects all numerical values).

To save/override settings for all next frames from current one press shift + / key.
To save/override settings for all previous frames from current one press shift + M key.
To start merging of all frames press shift + > key.
To go back to the 1st frame press shift + < key.
To only convert next frame press > key.
To go back 1 frame press < key.

8. Conversion of frames back into video:

After you merged/convert all the faces and you will have a folder named "merged" inside your "data_dst" folder containing all frames that makeup the video.
Last step is to convert them back into video and combine with original audio track from data_dst.mp4 file.

To do so you will use one of 4 provided .bat files that will use FFMPEG to combine all the frames into a video in one of the following formats - avi, mp4, loseless mp4 or loseless mov:
- 8) merged to avi
- 8) merged to mov lossless
- 8) merged to mp4 lossless
- 8) merged to mp4

And that's it! After you've done all these steps you should have a file called result.xxx (avi/mp4/mov) which is your deepfake video.
#2
CHANGELOG:

Code:
Official repository: https://github.com/iperov/DeepFaceLab
 
Please consider a donation.
 
============ CHANGELOG ============
 
== 02.08.2020 ==
 
SAEHD: now random_warp is disabled for pretraining mode by default
Merger: fix load time of xseg if it has no model files
 
== 18.07.2020 ==
 
Fixes
 
SAEHD: write_preview_history now works faster
The frequency at which the preview is saved now depends on the resolution.
For example 64x64 – every 10 iters. 448x448 – every 70 iters.
 
Merger: added option “Number of workers?”
Specify the number of threads to process. 
A low value may affect performance. 
A high value may result in memory error. 
The value may not be greater than CPU cores.
 
== 17.07.2020 ==
 
SAEHD:
 
Pretrain dataset is replaced with high quality FFHQ dataset.
 
Changed help for “Learning rate dropout” option:
When the face is trained enough, you can enable this option to get extra sharpness and reduce subpixel shake for less amount of iterations. 
Enabled it before “disable random warp” and before GAN. n disabled. y enabled
cpu enabled on CPU. This allows not to use extra VRAM, sacrificing 20% time of iteration.
 
Changed help for GAN option:
Train the network in Generative Adversarial manner. 
Forces the neural network to learn small details of the face. 
Enable it only when the face is trained enough and don't disable. 
Typical value is 0.1
 
improved GAN. Now it produces better skin detail, less patterned aggressive artifacts, works faster.
[img=896x548]https://i.imgur.com/Nbh3mw1.png[/img]
 
== 04.07.2020 ==
 
Fix bugs.
Renamed some 5.XSeg) scripts.
Changed help for GAN_power.
 
== 27.06.2020 ==
 
Extractor:
       Extraction now can be continued, but you must specify the same options again.
 
       added ‘Max number of faces from image’ option.
If you extract a src faceset that has frames with a large number of faces, 
it is advisable to set max faces to 3 to speed up extraction.
0 - unlimited
 
added ‘Image size’ option.
The higher image size, the worse face-enhancer works.
Use higher than 512 value only if the source image is sharp enough and the face does not need to be enhanced.
 
added ‘Jpeg quality’ option in range 1-100. The higher jpeg quality the larger the output file size

== 22.06.2020 ==
 
XSegEditor:
changed hotkey for xseg overlay mask
“overlay xseg mask” now works in polygon mode

 
== 21.06.2020 ==
 
SAEHD:
Resolution for –d archi is now automatically adjusted to be divisible by 32.
‘uniform_yaw’ now always enabled in pretrain mode.
 
Subprocessor now writes an error if it does not start.
 
XSegEditor: fixed incorrect count of labeled images.
 
XNViewMP: dark theme is enabled by default

 
== 19.06.2020 ==
 
SAEHD:
 
Maximum resolution is increased to 640.
 
‘hd’ archi is removed. ‘hd’ was experimental archi created to remove subpixel shake, but ‘lr_dropout’ and ‘disable random warping’ do that better.
 
‘uhd’ is renamed to ‘-u’
dfuhd and liaeuhd will be automatically renamed to df-u and liae-u in existing models.
 
Added new experimental archi (key -d) which doubles the resolution using the same computation cost.
It is mean same configs will be x2 faster, or for example you can set 448 resolution and it will train as 224.
Strongly recommended not to train from scratch and use pretrained models.
 
New archi naming:
'df' keeps more identity-preserved face.
'liae' can fix overly different face shapes.
'-u' increased likeness of the face.
'-d' (experimental) doubling the resolution using the same computation cost
Opts can be mixed (-ud)
Examples: df, liae, df-d, df-ud, liae-ud, ...
 
Not the best example of 448 df-ud trained on 11GB:

 
Improved GAN training (GAN_power option).  It was used for dst model, but actually we don’t need it for dst.
Instead, a second src GAN model with x2 smaller patch size was added, so the overall quality for hi-res models should be higher.
 
Added option ‘Uniform yaw distribution of samples (y/n)’:
       Helps to fix blurry side faces due to small amount of them in the faceset.
 
Quick96:
       Now based on df-ud archi and 20% faster.
 
XSeg trainer:
       Improved sample generator.
Now it randomly adds the background from other samples.
Result is reduced chance of random mask noise on the area outside the face.
Now you can specify ‘batch_size’ in range 2-16.
 
Reduced size of samples with applied XSeg mask. Thus size of packed samples with applied xseg mask is also reduced.
 
 
== 11.06.2020 ==
 
Trainer: fixed "Choose image for the preview history". Now you can switch between subpreviews using 'space' key.
Fixed "Write preview history". Now it writes all subpreviews in separated folders
 

also the last preview saved as _last.jpg before the first file

thus you can easily check the changes with the first file in photo viewer
 
 
XSegEditor: added text label of total labeled images
Changed frame line design
Changed loading frame design
 

 
== 08.06.2020 ==
 
SAEHD: resolution >= 256 now has second dssim loss function
 
SAEHD: lr_dropout now can be ‘n’, ‘y’, ‘cpu’. ‘n’ and ’y’ are the same as before.
‘cpu’ mean enabled on CPU. This allows not to use extra VRAM, sacrificing 20% time of iteration.
fix errors
 
reduced chance of the error "The paging file is too small for this operation to complete."
 
updated XNViewMP to 0.96.2
 
== 04.06.2020 ==
 
Manual extractor: now you can specify the face rectangle manually using ‘R Mouse button’.
It is useful for small, blurry, undetectable faces, animal faces.

Warning:
Landmarks cannot be placed on the face precisely, and they are actually used for positioning the red frame.
Therefore, such frames must be used only with XSeg workflow !
Try to keep the red frame the same as the adjacent frames.
 
added script
10.misc) make CPU only.bat
This script will convert your DeepFaceLab folder to work on CPU without any problems. An internet connection is required.
It is useful to train on Colab and merge interactively on your comp without GPU.
 
== 31.05.2020 ==
 
XSegEditor: added button "view XSeg mask overlay face"
 
== 06.05.2020 ==
 
Some fixes
 
SAEHD: changed UHD arhis. You have to retrain uhd models from scratch.
 
== 20.04.2020 ==
 
XSegEditor: fix bug
 
Merger: fix bug
 
== 15.04.2020 ==
 
XSegEditor: added view lock at the center by holding shift in drawing mode.
 
Merger: color transfer “sot-m”: speed optimization for 5-10%
 
Fix minor bug in sample loader
 
== 14.04.2020 ==
 
Merger: optimizations
 
        color transfer ‘sot-m’ : reduced color flickering, but consuming x5 more time to process
 
        added mask mode ‘learned-prd + learned-dst’ – produces largest area of both dst and predicted masks
XSegEditor : polygon is now transparent while editing
 
New example data_dst.mp4 video
 
New official mini tutorial https://www.youtube.com/watch?v=1smpMsfC3ls
 
== 06.04.2020 ==
 
Fixes for 16+ cpu cores and large facesets.
 
added 5.XSeg) data_dst/data_src mask for XSeg trainer - remove.bat
       removes labeled xseg polygons from the extracted frames
      
 
== 05.04.2020 ==
 
Decreased amount of RAM used by Sample Generator.
 
Fixed bug with input dialog in Windows 10
 
Fixed running XSegEditor when directory path contains spaces
 
SAEHD: ‘Face style power’ and ‘Background style power’  are now available for whole_face
 New help messages for these options.
 
XSegEditor: added button ‘view trained XSeg mask’, so you can see which frames should be masked to improve mask quality.
 
Merger:
added ‘raw-predict’ mode. Outputs raw predicted square image from the neural network.
 
mask-mode ‘learned’ replaced with 3 new modes:
       ‘learned-prd’ – smooth learned mask of the predicted face
       ‘learned-dst’ – smooth learned mask of DST face
       ‘learned-prd*learned-dst’ – smallest area of both (default)
            
 
Added new face type : head
Now you can replace the head.
Example: https://www.youtube.com/watch?v=xr5FHd0AdlQ
Requirements:
       Post processing skill in Adobe After Effects or Davinci Resolve.
Usage:
1)  Find suitable dst footage with the monotonous background behind head
2)  Use “extract head” script
3)  Gather rich src headset from only one scene (same color and haircut)
4)  Mask whole head for src and dst using XSeg editor
5)  Train XSeg
6)  Apply trained XSeg mask for src and dst headsets
7)  Train SAEHD using ‘head’ face_type as regular deepfake model with DF archi. You can use pretrained model for head. Minimum recommended resolution for head is 224.
8)  Extract multiple tracks, using Merger:
a.  Raw-rgb
b.  XSeg-prd mask
c.  XSeg-dst mask
9)  Using AAE or DavinciResolve, do:
a.  Hide source head using XSeg-prd mask: content-aware-fill, clone-stamp, background retraction, or other technique
b.  Overlay new head using XSeg-dst mask
 
Warning: Head faceset can be used for whole_face or less types of training only with XSeg masking.
 
 
 
== 30.03.2020 ==
 
New script:
       5.XSeg) data_dst/src mask for XSeg trainer - fetch.bat
Copies faces containing XSeg polygons to aligned_xseg\ dir.
Useful only if you want to collect labeled faces and reuse them in other fakes.
 
Now you can use trained XSeg mask in the SAEHD training process.
It’s mean default ‘full_face’ mask obtained from landmarks will be replaced with the mask obtained from the trained XSeg model.
use
5.XSeg.optional) trained mask for data_dst/data_src - apply.bat
5.XSeg.optional) trained mask for data_dst/data_src - remove.bat
 
Normally you don’t need it. You can use it, if you want to use ‘face_style’ and ‘bg_style’ with obstructions.
 
XSeg trainer : now you can choose type of face
XSeg trainer : now you can restart training in “override settings”
Merger: XSeg-* modes now can be used with all types of faces.
 
Therefore old MaskEditor, FANSEG models, and FAN-x modes have been removed,
because the new XSeg solution is better, simpler and more convenient, which costs only 1 hour of manual masking for regular deepfake.
 
 
== 27.03.2020 ==
 
XSegEditor: fix bugs, changed layout, added current filename label
 
SAEHD: fixed the use of pretrained liae model, now it produces less face morphing
 
== 25.03.2020 ==
 
SAEHD: added 'dfuhd' and 'liaeuhd' archi
uhd version is lighter than 'HD' but heavier than regular version.
liaeuhd provides more "src-like" result
comparison:
       liae:    https://i.imgur.com/JEICFwI.jpg
       liaeuhd: https://i.imgur.com/ymU7t5E.jpg
 
 
added new XSegEditor !
 
here new whole_face + XSeg workflow:
 
with XSeg model you can train your own mask segmentator for dst(and/or src) faces
that will be used by the merger for whole_face.
 
Instead of using a pretrained segmentator model (which does not exist),
you control which part of faces should be masked.
 
new scripts:
       5.XSeg) data_dst edit masks.bat
       5.XSeg) data_src edit masks.bat
       5.XSeg) train.bat
 
Usage:
       unpack dst faceset if packed
 
       run 5.XSeg) data_dst edit masks.bat
 
       Read tooltips on the buttons (en/ru/zn languages are supported)
 
       mask the face using include or exclude polygon mode.
      
       repeat for 50/100 faces,
             !!! you don't need to mask every frame of dst
             only frames where the face is different significantly,
             for example:
                    closed eyes
                    changed head direction
                    changed light
             the more various faces you mask, the more quality you will get
 
             Start masking from the upper left area and follow the clockwise direction.
             Keep the same logic of masking for all frames, for example:
                    the same approximated jaw line of the side faces, where the jaw is not visible
                    the same hair line
             Mask the obstructions using exclude polygon mode.
 
       run XSeg) train.bat
             train the model
 
             Check the faces of 'XSeg dst faces' preview.
 
             if some faces have wrong or glitchy mask, then repeat steps:
                    run edit
                    find these glitchy faces and mask them
                    train further or restart training from scratch
 
Restart training of XSeg model is only possible by deleting all 'model\XSeg_*' files.
 
If you want to get the mask of the predicted face (XSeg-prd mode) in merger,
you should repeat the same steps for src faceset.
 
New mask modes available in merger for whole_face:
 
XSeg-prd       - XSeg mask of predicted face   -> faces from src faceset should be labeled
XSeg-dst       - XSeg mask of dst face        -> faces from dst faceset should be labeled
XSeg-prd*XSeg-dst - the smallest area of both
 
if workspace\model folder contains trained XSeg model, then merger will use it,
otherwise you will get transparent mask by using XSeg-* modes.
 
Some screenshots:
XSegEditor: https://i.imgur.com/7Bk4RRV.jpg
trainer   : https://i.imgur.com/NM1Kn3s.jpg
merger    : https://i.imgur.com/glUzFQ8.jpg
 
example of the fake using 13 segmented dst faces
          : https://i.imgur.com/wmvyizU.gifv
 
 
== 18.03.2020 ==
 
Merger: fixed face jitter
 
== 15.03.2020 ==
 
global fixes
 
SAEHD: removed option learn_mask, it is now enabled by default
 
removed liaech arhi
 
removed support of extracted(aligned) PNG faces. Use old builds to convert from PNG to JPG.
 
 
== 07.03.2020 ==
 
returned back
3.optional) denoise data_dst images.bat
       Apply it if dst video is very sharp.
 
       Denoise dst images before face extraction.
       This technique helps neural network not to learn the noise.
       The result is less pixel shake of the predicted face.
      
 
SAEHD:
 
added new experimental archi
'liaech' - made by @chervonij. Based on liae, but produces more src-like face.
 
lr_dropout is now disabled in pretraining mode.
 
Sorter:
 
added sort by "face rect size in source image"
small faces from source image will be placed at the end
 
added sort by "best faces faster"
same as sort by "best faces"
but faces will be sorted by source-rect-area instead of blur.
 
 
 
== 28.02.2020 ==
 
Extractor:
 
image size for all faces is now 512
 
fix RuntimeWarning during the extraction process
 
SAEHD:
 
max resolution is now 512
 
fix hd arhitectures. Some decoder's weights haven't trained before.
 
new optimized training:
for every <batch_size*16> samples,
model collects <batch_size> samples with the highest error and learns them again
therefore hard samples will be trained more often
 
'models_opt_on_gpu' option is now available for multigpus (before only for 1 gpu)
 
fix 'autobackup_hour'
 
== 23.02.2020 ==
 
SAEHD: pretrain option is now available for whole_face type
 
fix sort by abs difference
fix sort by yaw/pitch/best for whole_face's
 
== 21.02.2020 ==
 
Trainer: decreased time of initialization
 
Merger: fixed some color flickering in overlay+rct mode
 
SAEHD:
 
added option Eyes priority (y/n)
 
       Helps to fix eye problems during training like "alien eyes"
       and wrong eyes direction ( especially on HD architectures )
       by forcing the neural network to train eyes with higher priority.
       before/after https://i.imgur.com/YQHOuSR.jpg
 
added experimental face type 'whole_face'
 
       Basic usage instruction: https://i.imgur.com/w7LkId2.jpg
      
       'whole_face' requires skill in Adobe After Effects.
 
       For using whole_face you have to extract whole_face's by using
       4) data_src extract whole_face
       and
       5) data_dst extract whole_face
       Images will be extracted in 512 resolution, so they can be used for regular full_face's and half_face's.
      
       'whole_face' covers whole area of face include forehead in training square,
       but training mask is still 'full_face'
       therefore it requires manual final masking and composing in Adobe After Effects.
 
added option 'masked_training'
       This option is available only for 'whole_face' type.
       Default is ON.
       Masked training clips training area to full_face mask,
       thus network will train the faces properly. 
       When the face is trained enough, disable this option to train all area of the frame.
       Merge with 'raw-rgb' mode, then use Adobe After Effects to manually mask, tune color, and compose whole face include forehead.
 
 
 
== 03.02.2020 ==
 
"Enable autobackup" option is replaced by
"Autobackup every N hour" 0..24 (default 0 disabled), Autobackup model files with preview every N hour
 
Merger:
 
'show alpha mask' now on 'V' button
 
'super resolution mode' is replaced by
'super resolution power' (0..100) which can be modified via 'T' 'G' buttons
 
default erode/blur values are 0.
 
new multiple faces detection log: https://i.imgur.com/0XObjsB.jpg
 
now uses all available CPU cores ( before max 6 )
so the more processors, the faster the process will be.
 
== 01.02.2020 ==
 
Merger:
 
increased speed
 
improved quality
 
SAEHD: default archi is now 'df'
 
== 30.01.2020 ==
 
removed use_float16 option
 
fix MultiGPU training
 
== 29.01.2020 ==
 
MultiGPU training:
fixed CUDNN_STREAM errors.
speed is significantly increased.
 
Trainer: added key 'b' : creates a backup even if the autobackup is disabled.
 
== 28.01.2020 ==
 
optimized face sample generator, CPU load is significantly reduced
 
fix of update preview for history after disabling the pretrain mode
 
 
SAEHD:
 
added new option
GAN power 0.0 .. 10.0
       Train the network in Generative Adversarial manner.
       Forces the neural network to learn small details of the face.
       You can enable/disable this option at any time,
       but better to enable it when the network is trained enough.
       Typical value is 1.0
       GAN power with pretrain mode will not work.
 
Example of enabling GAN on 81k iters +5k iters
https://i.imgur.com/OdXHLhU.jpg
https://i.imgur.com/CYAJmJx.jpg
 
 
dfhd: default Decoder dimensions are now 48
the preview for 256 res is now correctly displayed
 
fixed model naming/renaming/removing
 
 
Improvements for those involved in post-processing in AfterEffects:
 
Codec is reverted back to x264 in order to properly use in AfterEffects and video players.
 
Merger now always outputs the mask to workspace\data_dst\merged_mask
 
removed raw modes except raw-rgb
raw-rgb mode now outputs selected face mask_mode (before square mask)
 
'export alpha mask' button is replaced by 'show alpha mask'.
You can view the alpha mask without recompute the frames.
 
8) 'merged *.bat' now also output 'result_mask.' video file.
8) 'merged lossless' now uses x264 lossless codec (before PNG codec)
result_mask video file is always lossless.
 
Thus you can use result_mask video file as mask layer in the AfterEffects.
 
 
== 25.01.2020 ==
 
Upgraded to TF version 1.13.2
 
Removed the wait at first launch for most graphics cards.
 
Increased speed of training by 10-20%, but you have to retrain all models from scratch.
 
SAEHD:
 
added option 'use float16'
       Experimental option. Reduces the model size by half.
       Increases the speed of training.
       Decreases the accuracy of the model.
       The model may collapse or not train.
       Model may not learn the mask in large resolutions.
       You enable/disable this option at any time.
 
true_face_training option is replaced by
"True face power". 0.0000 .. 1.0
Experimental option. Discriminates the result face to be more like the src face. Higher value - stronger discrimination.
Comparison - https://i.imgur.com/czScS9q.png
 
== 23.01.2020 ==
 
SAEHD: fixed clipgrad option
 
== 22.01.2020 == BREAKING CHANGES !!!
 
Getting rid of the weakest link - AMD cards support.
All neural network codebase transferred to pure low-level TensorFlow backend, therefore
removed AMD/Intel cards support, now DFL works only on NVIDIA cards or CPU.
 
old DFL marked as 1.0 still available for download, but it will no longer be supported.
 
global code refactoring, fixes and optimizations
 
Extractor:
 
now you can choose on which GPUs (or CPU) to process
 
improved stability for < 4GB GPUs
 
increased speed of multi gpu initializing
 
now works in one pass (except manual mode)
so you won't lose the processed data if something goes wrong before the old 3rd pass
 
Faceset enhancer:
 
now you can choose on which GPUs (or CPU) to process
 
Trainer:
 
now you can choose on which GPUs (or CPU) to train the model.
Multi-gpu training is now supported.
Select identical cards, otherwise fast GPU will wait slow GPU every iteration.
 
now remembers the previous option input as default with the current workspace/model/ folder.
 
the number of sample generators now matches the available number of processors
 
saved models now have names instead of GPU indexes.
Therefore you can switch GPUs for every saved model.
Trainer offers to choose latest saved model by default.
You can rename or delete any model using the dialog.
 
models now save the optimizer weights in the model folder to continue training properly
 
removed all models except SAEHD, Quick96
 
trained model files from DFL 1.0 cannot be reused
 
AVATAR model is also removed.
How to create AVATAR like in this video? https://www.youtube.com/watch?v=4GdWD0yxvqw
1) capture yourself with your own speech repeating same head direction as celeb in target video
2) train regular deepfake model with celeb faces from target video as src, and your face as dst
3) merge celeb face onto your face with raw-predict mode
4) compose masked mouth with target video in AfterEffects
 
 
SAEHD:
 
now has 3 options: Encoder dimensions, Decoder dimensions, Decoder mask dimensions
 
now has 4 arhis: dfhd (default), liaehd, df, liae
df and liae are from SAE model, but use features from SAEHD model (such as combined loss and disable random warp)
 
dfhd/liaehd - changed encoder/decoder architectures
 
decoder model is combined with mask decoder model
mask training is combined with face training,
result is reduced time per iteration and decreased vram usage by optimizer
 
"Initialize CA weights" now works faster and integrated to "Initialize models" progress bar
 
removed optimizer_mode option
 
added option 'Place models and optimizer on GPU?'
  When you train on one GPU, by default model and optimizer weights are placed on GPU to accelerate the process.
  You can place they on CPU to free up extra VRAM, thus you can set larger model parameters.
  This option is unavailable in MultiGPU mode.
 
pretraining now does not use rgb channel shuffling
pretraining now can be continued
when pre-training is disabled:
1) iters and loss history are reset to 1
2) in df/dfhd archis, only the inter part of the encoder is reset (before encoder+inter)
   thus the fake will train faster with a pretrained df model
 
Merger ( renamed from Converter ):
 
now you can choose on which GPUs (or CPU) to process
 
new hot key combinations to navigate and override frame's configs
 
super resolution upscaler "RankSRGAN" is replaced by "FaceEnhancer"
 
FAN-x mask mode now works on GPU while merging (before on CPU),
therefore all models (Main face model + FAN-x + FaceEnhancer)
now work on GPU while merging, and work properly even on 2GB GPU.
 
Quick96:
 
now automatically uses pretrained model
 
Sorter:
 
removed all sort by *.bat files except one sort.bat
now you have to choose sort method in the dialog
 
Other:
 
all console dialogs are now more convenient
 
XnViewMP is updated to 0.94.1 version
 
ffmpeg is updated to 4.2.1 version
 
ffmpeg: video codec is changed to x265
 
_internal/vscode.bat starts VSCode IDE where you can view and edit DeepFaceLab source code.
 
removed russian/english manual. Read community manuals and tutorials here
https://mrdeepfakes.com/forums/forum-guides-and-tutorials
 
new github page design
 
== for older changelog see github page ==


Expect more updates to this guide in the future, for now that's it. If you have more questions that weren't covered in this guide or in any of the following threads:

DFL 2.0 FAQ: You are not allowed to view links. Register or Login to view.
Celebrity/Source creation guide: You are not allowed to view links. Register or Login to view.
DFL 2.0 General overview thread: You are not allowed to view links. Register or Login to view.
Or in any other thread in the questions sections: You are not allowed to view links. Register or Login to view.

You are not allowed to view links. Register or Login to view.
DFL 1.0 FAQ: You are not allowed to view links. Register or Login to view.

Feel free to post it here, any questions regarding common issues won't be answered, keep the thread clean and spam free, if you have a serious issue with the software, either create a new thread in the questions section or check the github for reported bugs/issues.[b]

Current issues/bugs:


[/b]Most of the issues you can find on the github page: You are not allowed to view links. Register or Login to view.
Before you create a new thread about a issue/bug you have please check github page to see if it wasn't already reported and if it was fixed or there is a temporary solution.
Threads about issues go in this forum section: You are not allowed to view links. Register or Login to view.

Short video tutorial - Whole head + XSeg training workflow:
by @iperov


NOTE: This thread is meant to be just a guide.

The only types of posts/questions allowed here are ones about the guide itself (suggestions on what to add to it) and also about bugs.
If you have any suggestions about the current state of the guide, want me to explain some features more or have found a bug (which wasn't already reported/fixed or isn't listed at the end of the guide) you can do so in this thread.

For anything else like questions about techniques/workflows, suggestions about features in DFL 2.0, complaints or just general talk about the process of making deepfakes in DFL 2.0 please post in this thread: You are not allowed to view links. Register or Login to view.
Also remember to check out the FAQ which has collection of some of the most often asked questions, tips and many other useful things.
#3
- reserved for future use -
If I helped you in any way or you enjoy my deepfakes, please consider a small donation.
Bitcoin: 1C3dq9zF2DhXKeu969EYmP9UTvHobKKNKF
Want to request a paid deepfake or have any questions reagarding the forums or deepfake creation using DeepFaceLab? Write me a message.
TMB-DF on the main website - You are not allowed to view links. Register or Login to view.
#4
DFL 2.0 Frequently asked questions - workflow tips, methods and techniques. - in making.

For XSEG FAQ visit XSEG guide: You are not allowed to view links. Register or Login to view.

Use ctrl+f to find what you are looking for, scroll to the bottom for some tips and links to some useful stuff.


1.Q: What's the difference between 1.0 and 2.0?

A: 2.0 is an improved and more optimized version, because of the optimization it offers better performance which means you can train higher resolution models or train existing ones faster. Merging and extraction is also significantly faster.

"That's great" you say... "But where is the catch?"

The catch is that DFL 2.0 no longer supports AMD GPUs/OpenCL, the only way to use it is with Nvidia GPU (minimum 3.0 CUDA compute level supported GPU required) or CPU.
Bear in mind training on a CPU is much slower and so is every other step like extraction and merging (previously called conversion).

Also the new version comes only with 2 models - SAEHD and Quick 96, no H128/H64/DF/LIAEF/SAE models are available. Also all trained/pretrained models (SAE/SAEHD) from 1.0 are not compatible with 2.0 so you need to train new models. There are also some other changes about which you can read about more in the main guide post above.

2. Q: How long does it take to make a deepfake?

A: Depending on how long your target (DST) video is, how large your SRC dataset/faceset is and what kind of model you are using to train your fake as well as your hardware (GPU).

It may take anywhere from half a day for a simple, short fake on a pretrained model to even 5-7 days if you are training your model from scratch or making a longer deepfake, especially if it's high resolution whole face type one that requires additional training of XSeg model and maybe even some work after merging in video editing software. It also depends on the hardware you have, if you'r gpu has less VRAM (4-6 GB) it will take longer to train a model that with a more capable GPU (8-24GB). It also depends on your skills, how fast you can find source materials for SRC dataset/faceset, how fast you can find suitable target (DST) video and how fast you can prepare both datasets for training.

3. Q: Can you make a deepfake video with just a few pictures?

A: In general, the answer will be no. The recommended way to make a faceset to make a decent deepfake is to use videos.
The more angles and facial expressions the better. Sure you can try to make a deepfake video with just a couple hundred photos, it will work, but the results will be less convincing.

4. Q: What is the ideal faceset size?

A: For the data_src (celebrity) faceset, It's recommend to have at least 4000-6000 different images.
Of course you can have more but generally 10.000-15.000 images is more than enough as long as there is variety in the dataset (different face angles and expressions).
It's best to take them for as little sources as possible, the more sources are used (especially when similar expressions/angles "overlap" between different sources) the higher the chance of model morphing to DST and looking more like SRC which may require running TF or keeping model training with RW enabled longer. For more detailed guide on making source facesets/datasets check this guide:
You are not allowed to view links. Register or Login to view.


5. Q: Why are my deepfakes turning out blurry?

A: There are many reasons for blurry faces.
Most often causes include - not training long enough, lack of necessary angles in the source dataset, bad alignment of the extracted source or destination facesets/datasets, bad settings during training or merging, blurry faces in the source or destination facesets/datasets. If you want to know what to do and what to avoid when making a deepfake please read the first post in this thread (guide part).

6. Q: Why is my result face not blinking/eyes look wrong/are cross-eyed?

A: This is most likely due to the lack of images in your data_src containing faces with closed eyes or with eyes looking in specific directions on some or all angles.
Make sure you have a decent amount of different facial expressions at all possible angles to match the expressions and angles of faces in the destination/target video - that includes faces with closed eyes and looking in different directions, without those the model doesn't know how face's eyes should look like, resulting in eyes not opening or looking all wrong.
Another cause for this might be running training with wrong settings or decreased dims settings.

7. Q: When should I stop training?

A: There is no correct answer, but the general consensus is to use the preview window to judge when to stop training and convert.
There is no exact iteration number or loss value where you should stop training. I would recommend at least 100.000 iterations if you are running a pretrained model or 200.000 iterations if you are running a fresh model from 0 but that number might be even as high as 300.000 iterations (depending on the variety and number of faces the model must learn).

8. Q: When should I enable/disable random warp, GAN, True Face, Style Power, Color Transfer, etc?

A: There is no correct answer too, but there is a correct order of enabling/disabling them
Pretty much all options like TF, GAN, SP should be only enabled once RW is disabled.
Color transfer can be enabled from the start or at the end, it all depends on how well your SRC faceset/dataset is matched color wise to your DST/target video.
RW should be enabled always when pretraining and during training it should be enabled at the beginning of the training for long enough time so that faces can be learned/generalized well to provide good base for further training.
Random Flip should be only enabled at the beginning of training or when you have a source faceset/dataset that is missing some angles - beware that using random flip may cause some issues and if face has some unsymmetrical features - they will be mirrored.
Lr_dropout should be always enabled at the end to smooth out any jitter that might be present due to a way the model trains, it should be never enabled earlier because it may prevent the model from correctly learning faces.
Masked training (whole face only) should be enabled for most of the training and only once face is trained well enough it should be disabled but there is nothing stopping anyone from disabling this option earlier, although running things like GAN and TF may cause some issues with it disabled. Reminder - masked training trains full face area only, after being disabled model can learn the whole face area.
Eyes_priority should be enabled after disabling RW or after training with GAN/TF/SP, etc.

9. Q: DFL isn't working at all (extraction, training, merging) and/or I'm getting an error message.

Your GPU might not be supported, you are trying to run an old model on newer version of DFL (or the other way) or there is an issue with the software or your PC.

First check if your GPU is supported, DFL requires CUDA compute capability of 3.0:
You are not allowed to view links. Register or Login to view.
Then see if you have the newest version of DFL which you can get here: You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view.
Then check if the model you are trying to run is still compatible, the easiest way is to try run a new model with the same parameters (adjust batch size to a low value like 2-4 for testing purposes)
If you are still having issue, check your PC, things like out of date GPU drivers and pending Windows updates can cause some issues.
If then it's still not working you can create a new thread in the question section, but before you do that check the issues tab on github to see if other users don't have the same issue/error: You are not allowed to view links. Register or Login to view.
If you can't find anything and you've searched forum for similar issues make a new thread here: You are not allowed to view links. Register or Login to view. or post about it here.

10. I'm getting OOM/out of memory errors while training SAEHD.

If you are getting OOM errors it means you are running out of VRAM, there are various settings you can change to fix that:

a) decrease batch size - lower batch size means the model trains on less images thus using less VRAM but it means that you will have to train longer to achieve the same result than with higher batch size, extremely low batch size like 2-4 may also lead to less accurate results.

b) change optimizer setting (models_opt_on_gpu) - when set to True the optimizer and model are both handled by GPU which means faster iteration times/performance/faster training but VRAM usage is higher, when set to False the duty of running network optimizer is handled by your CPU which means less VRAM usage and possible no OOM or even higher possible batch size but the training will be slower due to longer iteration time.

c) turn off additional features like face and bg style transfer, TrueFace training, GAN training or performance heavy CT method like SOT-M - enabling them increases iteration/training time and uses more VRAM.

11. Q: I've turned off all additional features and the training is still giving me OOM errors even at low batch size.

A: In that case you change even more settings but it will require you to start new model training because following can be only set once:

a) run models with reduced resolution - even with all the optimization you can do and disabling various features you may still not be able to run your desired resolution, just decrease it till you can run it (by the factor of 16)

b) decrease AutoEncoder dimensionsEncoder dimensionsDecoder dimensionsDecoder mask dimensions.
These settings control models dimensions so changing them can have dramatic effect on the models ability to learn features of the face and expressions. To low value may mean the model won't close eyes or not learn some facial feature. Only change them if there is nothing else you can do. For more info on what they do check the guide.

c) buy a GPU with more VRAM.

12. Q: I have too many similar faces in my source dataset, is there a tool I can use to remove them?

A: Yes, you can either use DFL built in sorting methods or use app like VisiPics to detect similar looking faces in your source dataset and remove them.

13. Q: I was training my model that already had several thousands iterations but the faces in preview window suddenly turned black/white/look weird, my loss values went up/are at zero.

A: Your model has collapsed, it means you cannot use it anymore and you have to start all over or if you had backups, use them.
To prevent model collapsing use gradient clip when starting training.

14. Q: 
If I train a model of Celeb A (data_src) and use Celeb B (data_dst) as the destination can I use the same model of Celeb A to swap with a new Celeb C? Can I reuse models?

A: Yes, it is actually recommended to reuse you models if you plan on making more fakes of the same source or even when using the same dataset. You can also resuse model when working with completely different source and destination/target.


15. Q: Should I pretrain my models?

A: As with reusing, yes, you should pretrain.

Use the built in pretrain function inside DFL which you can select when starting up a model. It is the correct way to pretrain your model, run this feature for anywhere from 200k to 400k iterations and turn it off once you want to finish pretraining.

16. Q: I'm getting an error: is not a dfl image file required for training in DeepFaceLab

A: It means that the pictures inside data_src/aligned and/or data_dst are not valid for training in DFL.

This can be caused be several things:

1. You are using one of the shared datasets of a celebrity, chances are they were made in a different software than DFL or in older version of it, even though the look like aligned faces (256x256 images) they may be just pictures extracted in different app that stored landmarks/alignment data in different way. To fix them all you need to is to just run alignment process on them, just place them into a "data_src" folder (not "aligned" folder inside it) and align them again by using 4) data_src extract faces S3FD

2. You edited faces/images inside aligned folder of data_src or data_dst in gimp/photoshop after aligning.
When you edit those images you overwrite landmarks/alignments data that is stored inside them.
If you want to edit these images first run 4.2) data_src util faceset metadata save to save alignment info in a separate file, then edit your images and run 4.2) data_src util faceset metadata restore to restore that data.
Only edits allowed are AI up-scaling/enhancing (which you can now also do using 4.2) data_src util faceset enhance instead of using external apps like Gigapixel), color correction or edits to the face that don't change it's shape (like removing or adding stuff), no flipping/mirroring or rotation is allowed.

3. You have regular, non extracted/aligned images in your "data_src/dst" or "aligned" folder.

4. You have _debug faces in your "data_src/aligned" folder. Delete them.

17. Q: I'm getting errors during conversion: no faces found for XYZ.jpg/png, copying without faces.

A: It means that for XYZ frame in "data_dst" folder no faces were extracted into "aligned" folder.


This may be because there were actually no faces visible in that frame (which is normal) or they were visible but due to an angle at which they were or obstruction they were not detected.
To fix that you need to extract those faces manually. Check the main guide, especially the section on cleaning up your data_dst dataset.

Overall you should make sure that you have as many faces aligned and properly extracted BEFORE starting to train.
And remember that both datasets should be cleaned up before training, to know more check the first post (guide) and also read this thread about preparing source datasets for use in training and for sharing on our forum: You are not allowed to view links. Register or Login to view.

18. Q: I'm getting errors: Warning: several faces detected. Highly recommended to treat them separately and Warning: several faces detected. Directional Blur will not be used. during conversion

A: It's caused by multiple faces within your data_dst/aligned folder.


The extraction process attemps to detect face in each frame at all cost. If it does detect multiple faces or one real face and falsely detects something else as a face it creates multiple files for each frames that look like this: 0001_0.jpg 0001_1.jpg 0001_2.jpg (in case of detecting 3 faces).

19. Q: After merging I see original/DST faces on some or all merged frames.

A: Make sure your converter mode is set to overlay or any other mode except for "original" and make sure you've aligned faces from all frames of your data_dst.mp4 file.

If you only see original faces on some frames, it's because they were not detected/aligned from those corresponding frames, it may happen due to various reasons: extreme angle where it's hard to see the face, blur/motion blur, obstructions, etc. Overall you want to always have all faces from your data_dst.mp4 aligned.

20. Q: What do those 0.2513 0.5612 numbers mean when training?

A: These are loss values. They indicate how well model is trained.
But you shouldn't focus on them unless you see sudden spikes in their value (up or down) after they already settled around some value, instead focus on preview windows and look for details like teeth separation, beauty marks, nose, eyes, if they are sharp and look good, then you don't have to worry about anything. Remember to always use gradient clipping when training SAEHD models to prevent model collapse during which your loss values may spike up.

21. Q: What are the ideal loss values, how low/high loss values should be?

A: It all depends on the settings, datasets and various different factors.
Generally you want to train with all features disabled except for random warp of samples (and gradient clipping which should be enabled all the time) to a loss of around 0.3-0.35, from there you can disable Random Warp and keep training until around 0.1-0.2 but you should look at previews and judge when to stop from that and not just from numerical values (which can vary depending on settings used). Some options will cause loss values to change once they are enabled.

22. Q: My model has collapsed, can I somehow recover it?

A: No, you need to start over, or use backup if you made them.

23. Q: What to do if you trained with a celebtity faceset and you want to add more faces/images/frames to it? How to add more variety to existing src/source/celebrity dataset?

A: Safest way is to change the name of the entire "data_src" folder to anything else or to temporarily move it somewhere else, then just extract frames from new data_src.mp4 file or if you already have the frames extracted and some pictures ready, create a new folder "data_src", copy them inside it and run data_src extraction/aligning process, then just copy aligned images from the old data_src/aligned folder into the new one and upon being asked by windows to replace or skip, select the option to rename files so you keep all of them and not end up replacing old ones with new ones.

24. Q: Does the dst faceset/data_dst.mp4 also need to be sharp and high quality? Can some faces in dst faceset/dataset/data_dst be a bit blurry/have shadows, etc? What to do with blurry faces in my data_dst/aligned folder

A: You want your data_dst to be as sharp and free of any motion blur as possible. Blurry faces in data_dst can cause a couple issues:

- first is that some of the faces in certain frames will not get detected - this will cause original faces to be shown on these frames when converting/merging because they couldn't be properly aligned during extraction so you will have to extract them manually.
- second is that others may be incorrectly aligned - this will cause final faces on this frames to be rotated/blurry and just look all wrong and similar to other blurry faces will have to be manually aligned to be used in training and conversion.
- third - even with manual aligning in some cases it may not be possible to correctly detect/align faces which again - will cause original faces to be visible on corresponding frames.
- faces that contain motion blur or are blurry (not sharp) that are correctly aligned may still produce bad results because the models that are used in training cannot understand motion blur, certain parts of the face like mouth when blurred out may appear bigger/wider or just different and the model will interpret this as a change of the shape/look of that part and thus both the predicted and the final faked face will look unnatural.
You should remove those blurry faces from training dataset (data_dst/aligned folder) and put them aside somewhere else and then copy them back into data_dst/aligned folder before converting so that we get the swapped face to show up on frames corresponding to those blurry faces.
To combat the odd look on face in motion you can use motion blur within the merger (but not it will only work if one set of faces is in the "data_dst/aligned" folder and all files end with _0 prefix).

You want both your SRC datasets and DST datasets to be as sharp and high quality as possible.
Small amount of blurriness on some frames shouldn't cause many issues. As for shadows, this depends on how much shadow we are talking about, small, light shadows will probably not be visible, you can get good results with shadows on faces but to much will also look bad, you want your faces to be lit as evenly as possible with as little of harsh/sharp and dark shadows as possible.

25. Q: I'm getting error reference_file not found when I try to convert my deepfake back into mp4 with 8) converted to mp4.

A: You are missing data_dst.mp4 file in your "workspace" folder, check if it wasn't deleted:

Reason why you need it is that even though you separated it into individual frames with 3) extract images from video data_dst FULL FPS all there is inside "data_dst" folder is just frames of the video, you also need sound, which is taken from the original data_dst.mp4 file.

26. Q: I accidentally deleted my data_dst.mp4 file and cannot recover it, can I still turn merged/converted frames into an mp4 video?

A: Yes, in case you've permanently deleted data_dst.mp4 and you have no way of recovering it or rendering identical file you can still convert it back into mp4 (albeit without sound) manually by using ffmpeg and a proper command:

- start by going into folder ...:\_internal\ffmpeg and copy ffmpeg.exe
- paste it into the merged folder
- open up command line by pressing windows key + r (run) and typing cmd or searching it up after pressing windows key and typing cmd/cmd.exe
- copy address of your merged folder (example: D:\DFL\workspace\data_dst\merged)
- in the command line type the letter of your drive, as in example above that would be "d:" (without quotation marks) and press enter
- line D:\> should appear, next type "cd: FULL_ADDRESS", example: "cd: D:\workspace\data_dst\merged"
- you should now see your entire address like this: D:\DFL\workspace\data_dst\merged>
- enter this command:

ffmpeg -r xx -i %d.jpg -vcodec libx264 -crf 20  -pix_fmt yuv420p result.mp4

- xx is framerate
- d is a number representing amount of numbers in the file name so if your merged frames have names like 15024.jpg that would be 5, if it's 5235.jpg it is 4, etc.
If your images are pngs, change .jpg to .png
- crf is quality setting, best to be left at 20.
If your merged file names have some letters in front like out12345.jpg add "out" before the % sign.

Example command for converting frames named "out_2315.png" into an 30 fps .mp4 file named "deepfake".

ffmpeg -r 30 -i out%4.png -vcodec libx264 -crf 20  -pix_fmt yuv420p deepfake.mp4

If you want to use x265 encoding change libx264 to libx265.

27. Q: Can you pause merging and resume it later? Can you save merger settings? My merging failed/I got error during merging and it's stuck at %, can I start it again and merge from last successfully merged frame?

A: Yes, by default interactive converter/merger creates session file in the "model" folder that saves both progress and settings.

If you want to just pause the training you can hit > and it will pause. If however you need to turn it off completely/restart pc, etc you exit from merger with esc and wait for it to save your progress, next time you launch merging, after selecting interactive merger/converter (Y/N) - Y you'll get a prompt asking if you want to use the save/session file and resume the progress, merger will load with the right settings at the right frame.

If your merging failed and it didn't save the progress you will have to resume it manually, you do it by first backing up your "data_dst" folder and then deleting all extracted frames inside data_dst as well as all images from "aligned" folder inside "data_dst" that correspond to frames already converted/merged inside folder "merged". Then just start merger/converter, enter settings you used before and convert rest of frames, then combine new merged frames with old ones from the backup "data_dst" folder and convert to .mp4 as usual.

28. Q: Faces in preview during training look good but after converting them they look bad. I see parts of the original face (chin, eyebrows, double face outline).

A: Faces in preview are the raw output of the AI that then need to be composited over the original footage.
Because of it, when faces have different shapes, or are slightly smaller/bigger you may see parts of the original face around/outside the mask that DFL merger creates.
To fix it you need to change conversion settings, start by:

- adjusting the mask type

- adjust mask erosion (size) and blur (feathering, smoothing the edge)

- adjust face size (scale)

NOTE: Negative erosion increases the mask size (covers more), positive decreases it.

29. Q: Final result/deepfake has weird artifacts, face changes colors, color bleed from background and make it flicker/darken/change color in the corners/on the edges when using Seamless mode.

A: You are using seamless/hist/seamless+hist overlay mode or you trained your model with source dataset/faceset with varying lighting conditions and didn't use any color transfer during training.

- use overlay or any other mode besides seamless/hist/seamless+hist
- if you want to use seamless:
- decrease size of the mask/face so it doesn't "touch" areas outside and doesn't as a result get the color of background/area outside of the face/head by increasing "Erode Mask" value.
- or smooth out the edge of the mask/face by increasing "Blur Mask" value which may hide some of the color changes, also helps make the face seem more... "seamless"  when you decrease mask size.
Both of these may or may not fix the issue, if still persist use simple overlay mode as stated above.

If your source dataset contained images of faces with varying lighting conditions and didn't use color transfer you may need to go back and keep training some more with color transfer enabled.
In case turning it on severely washes out colors or affects colors of training data/faces in a bad way (washed out colors, wrong colors, over saturated colors, noise) or makes the learned face blurry (due to too much variations that the model must learn all over as if there were new faces in your source and destination dataset) you may want to save landmarks data and edit your source dataset colors to better match your destination dataset and also have less variation.

I recommend to NOT use seamless unless it's absolutely needed and even then I recommend stopping on every major angle and camera shift/light change to see if it doesn't cause those artifacts.

30. Q: What's the difference between half face, mid-half face, full face and whole face face_type modes?


A: Whole face is a new mode that covers entire face/head, that means it also covers entire forehead and even some hair and other features that could be cut of by the full face mode and would definitely never be visible when using mid-half or half face mode. It also comes with new option during training that let's you train the forehead called masked_training. First you start with it enabled and it clips the training mask to full face area, once face is trained sufficiently you disable it and it trains the whole face/head. This mode requires either manual masking in post or training your own XSeg model:
You are not allowed to view links. Register or Login to view.

Full face is a recommended face_type mode to get as much coverage of face as possible without anything that's not needed (hairline, forehead and other parts of the head)
Half face mode was a default face_type mode in H64 and H128 models. It covers only half of the face (from mouth to a bit below eyebrows)

Mid-half face is a mode that covers around 30% larger area than half face.

31. Q: What is the best GPU for deepfakes? I want to upgrade my gpu, which one should I get?

A: Answer to this will change as deepfaking software gets further developed and GPUs become more powerful but for now the best GPU is the one that has most VRAM and is generally fast.

For performance figures check our SAE spreadsheet: You are not allowed to view links. Register or Login to view.

Recommended minimum is 6GB of VRAM 10 series GPU (1060 6GB) - good for resolutions between 128-160.
For higher resolution models (160-192) it's recommended to get at least 8GB VRAM equipped 10/20 series GPU (1070/2070) or higher.
For highest resolutions (192-256) you may need as much as 11GB or more VRAM 10/20 series GPU (1080Ti,2080Ti, RTX Titan) or higher.

Bear in mind that training performance depends on settings used during training, a full enabled (all features on) 128 DF model may run slower than an 192 DFHD model with turned down dims and all features disabled.

32. Q: What do the AutoEncoder, Encoder, Decoder and D_Mask_Decoder dims settings do? What does changing them does?


A: AutoEncoder, Encoder, Decoder and D_Mask_Decoder dims affect models neural network dimensions.

They can be changed to either increase performance or quality, setting them to high will make models really hard to train (slow, high vram usage) but will give more accurate results and more src like looking face, set it to low and performance will increase but the results will be less accurate and model may not learn certain features of the faces, resulting in generic output that looks more like dst or nothing like either dst or src.

AutoEncoder dimensions ( 32-1024 ?:help ) : this is the overall model capacity to learn.
Too low value and it won't be able to learn everything - higher value will make model be able to learn more expressions and be more accurate at the cost of performance.

Encoder dimensions ( 16-256 ?:help ) : this affects the ability of the model to learn different expressions, states of the face, angles, lighting conditions.
Too low value and model may not be able to learn certain expressions, model might not be closing eyes, mouth, some angles may be less detailed accurate, higher value will lead to more accurate and expressive model assuming AE dims will be increased accordingly at the cost of performance.

Decoder dimensions ( 16-256 ?:help ) : this affects the ability of the model to learn fine detail, textures, teeth, eyes - small things that make face detailed and recognizable.
Too low value will cause some details to not be learned (such as teeth and eyes looking blurry, lack of texture), also some subtle expressions and facial features/texture may not be learned properly, resulting in less src like looking face, higher value will make the face more detailed and model will be able to pick up more of those subtle details at the cost of performance.

Decoder mask dimensions ( 16-256 ?:help ) : affects quality of the learned mask when training with Learn mask enabled. Does not affect the quality of training.

33. Q: Whats the recommended batch size? How high should I set the batch size? How low can batch size be set?

A: There is no recommended batch size but the reasonable value is between 8-12, with values above 16-22 being exceptionally good and 4-6 being a minimum.

Batch size of 2 is not enough to correctly train a model so value of 4 is the recommended minimum, the higher the value the better but at some point higher batch size may not be beneficial, especially if your iteration time starts to increase or you have to disable models_opt_on_gpu
- and thus forcing optimizer on CPU which slows down training/increases iteration time.
You can calculate when increasing batch size is becoming less efficient by dividing iteration time by the batch size. Choose that batch size that gives you lower ms value per batch for a given iteration time, for example:

1000 ms at batch 8 - 1000 / 8 = 128
1500 ms at batch 10 - 1500 / 10 = 150

In this case running with batch 8 will be feeding model more data in a given time than with batch 10. However the difference is small. If say we want to use batch 12 but we get an OOM - so we disable models_opt_on_gpu it may now look like this:
2300 ms at batch 12 (Optimizer on CPU) - 2300 / 12 = 191 ms which is much longer that 128 ms with batch 8 and iteration time of 1000 ms.

When starting model it's better to go with lower batch size - higher iteration time and then increase it once we disable random warp.

34. Q: How to use pretrained model?

A: Simply download it and put all the files directly into your model folder.

Start training, press any key within 2 seconds after selecting model for training (if you have more in the folder) and device to train with (GPU/CPU) to override model settings and make sure the pretrain option is disabled so that you start proper training, if you leave pretrain options enabled the model will carry on with pretraining. Note that the model will revert iteration count to 0, that's normal behavior for pretrained model, unlike a model that was just trained on random faces without the use of pretrain function.

35. Q: My GPU usage is very low/GPU isn't being used despite selecting GPU for training/merging.

A: It probably is being used but Windows doesn't report just CUDA usage (which is what you should be looking at) but total GPU usage which may be lower (around 5-10%).

To see true CUDA/GPU usage during training (in Windows 10), go into Task Manager -> Performance -> Select GPU -> Change one of the 4 smaller graphs to CUDA as shown in the screenshot below.


[Image: PL99L1j.jpg]

If you are using different version of Windows - download external monitoring software such as HWmonitor or GPU-Z or look at the VRAM usage which should be close to maximum during training.
If I helped you in any way or you enjoy my deepfakes, please consider a small donation.
Bitcoin: 1C3dq9zF2DhXKeu969EYmP9UTvHobKKNKF
Want to request a paid deepfake or have any questions reagarding the forums or deepfake creation using DeepFaceLab? Write me a message.
TMB-DF on the main website - You are not allowed to view links. Register or Login to view.
#5
my advanced advices , translated using deepl.com


SAEHD model options.

Random_flip
Turn the image from left to right by random rotation. Allows for better generalization of faces. Slows down training slightly until a clear face is achieved. If both src and dst face sets are quite diverse, this option is not useful. You can turn it off after a workout.

Batch_size
Improves facial generalization, especially useful at an early stage. But it increases the time until a clear face is achieved. Increases memory usage. In terms of quality of the final fairy, the higher the value, the better. It's not worth putting it below 4.

Resolution.
At first glance, the more the better. However, if the face in the frame is small, there is no point in choosing a large resolution. By increasing the resolution, the training time increases. For face_type=wf, more resolution is required, because the coverage of the face is larger, thus the details of the face are reduced. For wf it makes no sense to choose less than 224.

Face_type.
Face coverage in training. The more facial area is covered, the more plausible the result will be.
The whole_face allows covering the area below the chin and forehead. However, there is no automatic removal of the mask with the forehead, so XSeg is required for the merge, either in Davinci Resolve or Adobe After Effects.

Archi.
Liae makes more morph under dst face, but src face in it will still be recognized.
Df allows you to make the most believable face, but requires more manual work to collect a good variety of src facets and a final color matching.
The effectiveness of hd architectures has not been proven at this time. The Hd architectures were designed to better smooth the subpixel transition of the face at micro displacements, but the micro shake is also eliminated at df, see below.

Ae_dims.
Dimensions of the main brain of the network, which is responsible for generating facial expressions created in the encoder and for supplying a variety of code to the decoder.

E_dims.
The dimensions of the encoder network that are responsible for face detection and further recognition. When these dimensions are not enough, and the facial chips are too diverse, then we have to sacrifice non-standard cases, those that are as much as possible different from the general cases, thus reducing their quality.

D_dims.
The network dimensions of the decoder, which are responsible for generating the image from the code obtained from the brain of the network. When these dimensions are not enough, and the weekend faces are too different in color, lighting, etc., you have to sacrifice the maximum allowed sharpness.

D_mask_dims.
Dimensions of the mask decoder network, which are responsible for forming the mask image.
16-22 is the normal value for a fake without an edited mask in XSeg editor.

At the moment there is no experimentally proven data that would indicate which values are better. All we know is that if you put really low values, the error curve will reach the plateau quickly enough and the face will not reach clarity.

Masked_training. (only for whole_face).
Enabled (default) - trains only the area inside the face mask, and anything outside that area is ignored. Allows the net to focus on the face only, thus speeding up facial training and facial expressions.
When the face is sufficiently trained, you can disable this option, then everything outside the face - the forehead, part of the hair, background - will be trained.

Eyes_prio.
Set a higher priority for image reconstruction in the eye area. Thus improving the generalization and comparison of the eyes of two faces. Increases iteration time.

Lr_dropout.
Include only when the face is already sufficiently trained. Enhance facial detail and improve subpixel facial transitions to reduce shake.
Spends more video memory. So when selecting a network configuration for your graphics card, consider enabling this option.

Random_warp.
Turn it off only when your face is already sufficiently trained. Allows you to improve facial detail and subpixel transitions of facial features, reducing shake.

GAN_power.
Allows for improved facial detail. Include only when the face is already sufficiently trained. Requires more memory, greatly increases iteration time.  
The work is based on the generative and adversarial principle. At first, you will see artifacts in areas that do not match the clarity of the target image, such as teeth, eye edges, etc. So train long enough.

True_face_power.
Experimental option. You don't have to turn it on. Adjusts the predicted face to src in the most "hard way". Artifacts and incorrect light transfer from dst may appear.

[i]Face_
style_power .
Adjusts the color distribution of the predicted face in the area inside the mask to dst. Artefacts may appear. The face may become more like dst. The model may collapse.
Start at 0.0001 and watch the changes in preview_history, turn on the backup every hour.

Bg_style_power.
Trains the area in the predicted face outside the face mask to be equal to the same area in the dst face. In this way the predicted face is similar to the morph in dst face with already less recognizable facial src features.

The Face_style_power and Bg_style_power must work in pairs to make the complexion fit to dst and the background take from dst. Morph allows you to get rid of many problems with color and face matching, but at the expense of recognition in it src face.

ct_mode.
It is used to fit the average color distribution of a face set src to dst. Unlike Face_style_power is a safer way, but not the fact that you get an identical color transfer. Try each one, look at the preview history which one is closer to dst and train on it.

Clipgrad.
It reduces the chance of a model collapse to almost zero. Model collapse occurs when artifacts appear or when the windows of the predicted faces are colored in the same color. Model collapse can occur when using some options or when there is not enough variety of face sets dst.
Therefore, it is best to use autobackup every 2-4 hours, and if collapse occurs, roll back and turn on clipgrad. .

Pretrain.
Engage model pre-training. Performed by 24 thousand people prepared in advance. Using the pre-trained model you accelerate the training of any fairy.
It is recommended to train as long as possible. 1-2 days is good. 2 weeks is perfect. At the end of the pre-training, save the model files for later use. Switch off the option and train as usual.
You can and should share your pre-trained model in the community.

[/i]
Size of src and dst face set.
.

The problem with a large number of src images is repetitive faces, which will play little role. Therefore, faces with rare angles will train less frequently, which has a bad effect on quality. Therefore, 3000-4000 faces are optimal for src facial recruitment. If you have more than 5000 faces, sort by best into fewer faces. Sorting will select from the optimal ratio of angles and color variety.


The same logic is true for dst. But dst is footage from video, each of which must be well trained to be identified by the neural network when it is closer. So if you have too many faces in dst, from 3000 and more, it is optimal to make their backup, then sort by best in 3000, train the network to say 100.000 iterations, then return the original number of dst faces and train further until the optimal result is achieved.


How to get lighting similar to dst face?
.

It's about lighting, not color matching. It's just about collecting a more diverse src set of faces.


How to suppress color flickering in DF model?
.

If the src set of faces contains a variety of make-up, it can lead to color shimmering DF model. Option: At the end of your training, leave at least 150 faces of the same makeup and train for several hours.

How else can you adjust the color of the predicted face to dst?
.

If nothing fits automatically, use the video editor and glue the faces in it. With the video editor, you get a lot more freedom to note colors.

How to make a face look more like src?

1. Use DF architecture.

2. Use a similar face shape in dst.
.

3 It is known that a large color variety of facial src decreases facial resemblance, because a neural network essentially interpolates the face from what it has seen.

For example, in your src set of faces from 7 different color scenes, and the sum of faces is only 1500, so under each dst scene will be used 1500 / 7 faces, which is 7 times poorer than if you use 1500 faces of one scene. As a result, the predicted face will be very different from the src.

Microquake the predicted face in the end video.
.

The higher the resolution of the model, the longer it needs to be trained to suppress the micro-shake.
You should also enable lr_dropout and disable random_warp after 200-300k iterations at batch_size 8.
It is not rare that the microshake can appear if the dst video is too clear. It is difficult for a neural network to distinguish unambiguous information about a face when it is overflowed with micro-pixel noise. Therefore, after extracting frames from dst video, before extracting faces, you can pass through the frames with the noise filter denoise data_dst images.bat. This filter will remove temporal noise.
Also, ae_dims magnification may suppress the microshock.

Use a quick model to check the generalization of facial features.
.

If you're thinking of a higher resolution fake, start by running at least a few hours at resolution 96. This will help identify facial generalization problems and correct facial sets.
Examples of such problems:

1. Non-closing eyes/mouth - no closed eyes/mouth in src.

2. wrong face rotation - not enough faces with different turns in both src and dst face sets.

Training algorithm for achieving high definition.
1. use -ud model
2. train, say, up to 300k.
3. enable learning rate dropout for 100k
4. disable random warp for 50k.
5. enable gan

Do not use training GPU for video output.
.

This can reduce performance, reduce the amount of free GPU video memory, and in some cases lead to OOM errors.
Buy a second cheap video card such as GT 730 or a similar, use it for video output.
There is also an option to use the built-in GPU in Intel processors. To do this, activate it in BIOS, install drivers, connect the monitor to the motherboard.
.
Using Multi-GPU.

Multi-GPU can improve the quality of the fake. In some cases, it can also accelerate training.
Choose identical GPU models, otherwise the fast model will wait for the slow model, thus you will not get the acceleration.
Working Principle: batch_size is divided into each GPU. Accordingly, you either get the acceleration due to the fact that less work is allocated to each GPU, or you increase batch_size by the number of GPUs, increasing the quality of the fairy.
In some cases, disabling the model_opts_on_gpu can speed up your training when using 4 or more GPUs.
As the number of samples increases, the load on the CPU to generate samples increases. Therefore it is recommended to use the latest generation CPU and memory.

NVLink, SLI
.

Not working and not used. Moreover, the SLI enabled may cause errors.

Factors that reduce fairy success.
1. Big face in the frame.

2. Side lights. Transitions lighting. Color lighting.

3. not a diverse set of dst faces.

For example, you train a faceake, where the whole set of dst faces is a one-way turned head. Generating faces in this case can be bad. The solution: extract additional faces of the same actor, train them well enough, then leave only the target faces in dst.

Factors that increase the success of the fairy.
.

1. Variety of src faces: different angles including side faces. Variety of lighting.

Other.
.

In 2018, when fairies first appeared, people liked any lousy quality of fairies, where the face glimpsed, and was barely like a target celebrity. Now, even in a technically perfect replacement using a parodist similar to the target celebrity, the viral video effect may not be present at all. Popular youtube channels specializing in dipfeikas are constantly inventing something new to keep the audience. If you have watched and watched a lot of movies, know all the memo videos, you can probably come up with great ideas for dipfeik. A good idea is 50% success. The technical quality can be increased through practice.

Not all celebrity couples can be well used for a dipfeike. If the size of the skulls is significantly different, the similarity of the result will be extremely low. With experience dipfeik should understand what will be good fairies and what not.
#6
i downloaded the workspace and intenal folders but how does those file helping me ?? where shuould i extract them ??

[Image: C4U5l7bh.jpg]
#7
(04-13-2020, 11:19 AM)lior4113 Wrote: You are not allowed to view links. Register or Login to view.i downloaded the workspace and intenal folders but how does those file helping me ?? where shuould i extract them ??

[Image: C4U5l7bh.jpg]

From the guide, first section:

"As mentioned at the beginning all of that data is stored in the "workspace" folder, that's where both data_src/dst.mp4 files, both "data_src/dst" folders are (with extracted frames and "aligned"/"aligned_debug" folders for extracted/aligned faces) and the "model" folder where model files are stored."

I forgot to add a clear explanation of where you put model files because I thought that line would be enough to understand.

I've added entry to both pretrained/trained model sharing thread as well as to the guide and FAQ as well as fixes few "mistakes" in terminology.
If I helped you in any way or you enjoy my deepfakes, please consider a small donation.
Bitcoin: 1C3dq9zF2DhXKeu969EYmP9UTvHobKKNKF
Want to request a paid deepfake or have any questions reagarding the forums or deepfake creation using DeepFaceLab? Write me a message.
TMB-DF on the main website - You are not allowed to view links. Register or Login to view.
#8
(04-13-2020, 01:47 PM)tutsmybarreh Wrote: You are not allowed to view links. Register or Login to view.
(04-13-2020, 11:19 AM)lior4113 Wrote: You are not allowed to view links. Register or Login to view.i downloaded the workspace and intenal folders but how does those file helping me ?? where shuould i extract them ??

[Image: C4U5l7bh.jpg]

From the guide, first section:

"As mentioned at the beginning all of that data is stored in the "workspace" folder, that's where both data_src/dst.mp4 files, both "data_src/dst" folders are (with extracted frames and "aligned"/"aligned_debug" folders for extracted/aligned faces) and the "model" folder where model files are stored."

I forgot to add a clear explanation of where you put model files because I thought that line would be enough to understand.

I've added entry to both pretrained/trained model sharing thread as well as to the guide and FAQ as well as fixes few "mistakes" in terminology.

i downloaded the first file 1.8GB with workspace and internal and the scripts, if i want to use this pretrained update ,i extracted zip and now have this in some folder in my computer (below)
[Image: LZBK7uGh.jpg]

what should i do with that folder ? should i copy/replace/cut it somwhere into my 1.8gb folder and then my 1.8gb folder will be updated ?
#9
deepfake tutorial, using 'whole_face' + XSeg


#10
where did all the previos releases go on github?
  • 1(current)
  • 2
  • 3
  • 4
  • 5
  • ...
  • 15
  • Next 

Forum Jump:

Users browsing this thread: dthscyth, garrafielo, [email protected], 9 Guest(s)