MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We are looking for community members who are intested in helping out. See our HELP WANTED post.

[GUIDE] - DeepFaceLab 2.0 Guide

DeepFaceLab 2.0 Guide

READ ENTIRE GUIDE, AS WELL AS FAQS AND USE THE SEARCH OPTION BEFORE YOU POST A NEW QUESTION OR CREATE A NEW THREAD ABOUT AN ISSUE YOU'RE EXPERIENCING!
IF YOU WANT TO MAKE YOUR OWN GUIDE BASED ON MINE OR ARE REPOSTING IT, PLEASE CREDIT ME, DON'T STEAL IT.
IF YOU LEARNED SOMETHING USEFUL, CONSIDER A DONATION SO I CAN KEEP MAINTAINING THIS GUIDE, IT TOOK MANY HOURS TO WRITE.

DFL 2.0 DOWNLOAD (GITHUB, MEGA AND TORRENT): DOWNLOAD
DEEP FACE LIVE: DOWNLOAD
DFL 2.0 GITHUB PAGE (new updates, technical support and issues reporting): GITHUB


Colab guide and link to original implementation: https://mrdeepfakes.com/forums/thread-guide-deepfacelab-google-colab-tutorial
DFL paper (technical breakdown of the code): https://arxiv.org/pdf/2005.05535.pdf
Other useful guides and threads: https://mrdeepfakes.com/forums/thread-making-deepfakes-guides-and-threads[/LEFT]

STEP 0 - INTRODUCTION:
1. Requirements.

Usage of Deep Face Lab 2.0 requires high performance PC with modern GPU, ample RAM, storage and fast CPU. Windows 10 is generally recommended for most users but more advanced users may want to use Linux to get better performance. Windows 11 also works.

Minimum requirements for making very basic and low quality/resolution deepfakes:

- modern 4 core CPU supporting AVX and SSE instructions
- 16GB of RAM
- modern Nvidia or AMD GPU with 8GB of VRAM
- plenty of storage space and large pagefile

Make sure to enable Hardware-Accelerated GPU Scheduling under Windows 10/11 and ensure your gpu drivers are up-to-date.

2. Download correct build of DFL for your GPU (build naming scheme may change):

- for Nvidia GTX 900-1000 and RTX 2000 series and other GPUs utilizing the same architectures as those series use "DeepFaceLab_NVIDIA_up_to_RTX2080Ti" build.
- for Nvidia RTX 3000-4000 series cards and other GPUs utilizing the same architectures use "DeepFaceLab_NVIDIA_RTX3000_series" build.
- for modern AMD cards use "DeepFaceLab_DirectX12" build (may not work on some older AMD GPUs).

STEP 1 - DFL BASICS:
DeepFaceLab 2.0 consists of several .bat, these scripts are used to run various processes required to create a deepfakes, in the main folder you'll see them and 2 folders:
  • _internal - internal files, stuff that makes DFL work, No Touchy!
  • workspace - this is where your models, videos, frames, datasets and final video outputs are.
Basic terminology:

SRC - always refers to content (frames, faces) of the person whose face we are trying to swap into a target video or photo.

SRC set/SRC dataset/Source dataset/SRC faces - extracted faces (square ratio image file of the source face that contains additional data like landmarks, masks, xseg labels, position/size/rotation on original frame) of the person we are trying to swap into a video.

DST - always refers to content (frames, faces) from the target video (or DST/DST video) we are swapping faces in.

DST set/DST dataset/Target dataset/DST faces - collection of extracted faces of the target person whose faces we will be replacing with likeness of SRC, same format and contains the same type of data as SRC faces.

Frames - frames extracted from either source or target videos, after extraction of frames they're placed inside "data_src" or "data_dst" folders respectively.

Faces - SRC/DST images of faces extracted from original frames derived from videos or photos used.

Model - collection of files that make up SAEHD, AMP and XSeg models that user can create/train, all are placed inside the "model" folder which is inside the "workspace" folder, basic description models below (more detailed later in the guide):

1. SAEHD - most popular and most often used model, comes in several different variants based on various architectures, each with it's own advantages and disadvantages however in general it's meant to swap faces when both SRC and DST share some similarities, particularly general face/head shape. Can be freely reused, pretrained and in general can offer quick results at decent quality but some architectures can suffer from low likeness or poor lighting and color matching.

2. AMP - new experimental model that can adapt more to the source data and retain it's shape, meaning you can use it to swap faces that look nothing alike however this requires manual compositing as DFL does not have more advanced masking techniques such as background inpainting. Unlike SAEHD doesn't have different architectures to choose from and is less versatile when it comes to reusal and takes much longer to train, also doesn't have pretrain option but can offer much higher quality and results can look more like SRC.

3. Quick 96 - Testing model, uses SAEHD DF-UD 96 resolution parameters and Full Face face type, meant for quick tests.

4. XSeg - User trainable masking model used to generate more precise masks for SRC and DST faces that can exlcude various obstructions (depending on users labels on SRC and DST faces), DFL comes with generic trained Whole Face masking model you can use if you don't want to create your own labels right away.

XSeg labels - labels created by user in the XSeg editor that define shapes of faces, may also include exclusions (or not include in the first place) obstructions over SRC and DST faces, used to train XSeg model to generate masks.

Masks - generated by XSeg model, masks are needed to define areas of the face that are supposed to be trained (be it SRC or DST faces) as well as define shape and obstructions needed for final masking during merging (DST). A form of basic masks is also embeded into extracted faces by default which is derived from facial landmarks, it's a basic mask that can be used to do basic swaps using Full Face face type models or lower (more about face types and masks later in the guide)

Now that you know some basic terms it's time to figure out what exactly you want to do.

Depending on how complex the video you're trying to face swap is you'll either need just few interviews or you may need to collect way more source content to create your SRC dataset which may also include high resolution photos, movies, tv shows and so on, the idea is to build a set that covers as many angles, expressions and lighting conditions that are present in the target video as possible, as you may suspect this is the most important part of making a good deepfake, it's not always possible to find all required shots hence why you'll never be able to achieve 100% success with all videos you make, even once you learn all the tricks and techniques unless you focus only on very simple videos. And remember that's it's not about the number of faces, it's all about the variety of expressions, angles and lighting conditions while maintaining good quality across all faces, also the less different sources you'll end up using the better resemblance to the source will be as the model will have easier time learning faces that come from the same source as opposed to having to learn same amount of faces that come from more different sources.

A good deepfake also requires that both your source and target person have similarly shaped heads, while it is possible to swap people that look nothing alike and the new AMP model promises to address the issue of diffrent face shapes a bit it's still imporant that the width and length of a head as well as shape of jawline, chin and the general proportions of a face are similar for optimal results. If both people also make similar expressions then that's even better.

Let's assume you know what video you'll be using as a target, you collected plenty of source data to create a source set or at least made sure that there is enough of it and that it is good quality, both your source and target person have similarly shaped head, now we can move on to the process of actually creating the video, follow the steps below:

STEP 2 - WORKSPACE CLEANUP/DELETION:

1) Clear Workspace - deletes all data from the "workspace" folder, there are some demo files by deafult in the "workspace" folder when you download new build of DFL that you can use to practise your first fake, you can delete them by hand or use this .bat to cleak your "workspace" folder but as you rarely just delete models and datasets after you finish working on a project this .bat is basically useless and dangerous since you can accidentally delete all your work hence why I recommend you delete this .bat.

STEP 3 - SOURCE CONTENT COLLECTION AND EXTRACTION:

To create a good quality source dataset you'll need to find source material of your subject, that can be photos or videos, videos are preferred due to variety of expressions and angles that are needed to cover all possible appearances of face so that model can learn it correct, photos on the other hand often offer excelent detail and are prefect for simple frontal scenes and will provide much sharper results. You can also combine videos and photos. Below are some things that you need to ensure so that your source dataset is as good as it can be.

1. Videos/photos should cover all or at least most of possible face/head angles - looking up, down, left, right, straight at camera and everything in between, the best way to achieve it is to use more than one interview and many movies instead of relying on single video (which will mostly feature one angle and some small variations and one lighting type).

TIP: If your DST video does not contain certain angles (like side face profiles) or lighting conditions there is no need to include sources with such ligthing and angles, you can create a source set that works only with specific types of angles and lighting or create a bigger and more universal set that should work across multiple different target videos. It's up to you how many different videos you'll use but remember that using too many different sources that actually decrease resemblance of your results, if you can cover all angles and few required lighting conditions with less sources it's always better to actually use less content and thus keeping the SRC set smaller.

2. Videos/photos should cover all different facial expressions - that includes open/closed mouths, open/closed eyes, smiles, frowns, eyes looking in different directions - the more variety in expressions you can get the better results will be.

3. Source content should be consistent - you don't want blurry, low resolution and heavilly compressed faces next to crisp, sharp and high quality ones so you should only use the best quality videos and photos you can find, if however you can't or certain angles/expressions are present only in lower quality/blurry video/photo then you should keep those and attempt to upscaled them.

Upscaling can be done directly on frames or video using software like Topaz or on faces (after extraction) like DFDNet, DFL Enhance, Remini, GPEN and many more (new upscalling methods are created all the time, machine learning is constantly evolving).

TIP: Good consistency is especially important in the following cases:

Faces with beards - try to only use single movie or photos and interviews that were shot on the same day, unless the face you're going to swap is small and you won't be able to tell individual hair apart, in that case mixing of source shot at different dates is allowed but only as long the beard stilll has similar appearance.

Head swaps with short hair - due to more random nature of hair on heads you shoulld only be using content that was shot on the same (interviews, photos) and don't mix it with other content, or if you're using a movie then stick to one movie.

Exception to above would be if hair and beard is always stylized in the same way or is fake and thus doesn't change, in that case mix as many sources as you wish to.

Faces with makeup - avoid including sources where makeup differs significantly from the type given person typically has, if you must use videos or photos with specific makeup that doesn't go along with others try to color correct frames (after extraction on frames with batch image processing or before during video editing), this can be done after extraction too but requires one to save metadata first and restore it afteer editing faces, more about it in next step).

4. Most of it should be high quality - as mentioned above, you can leave use some blurry photos/videos but only when you can't find certain expressions/face angles in others but make sure you upscale them to acceptable quality, too many upscaled conteent may have negative effect on quality so it's best to use it only on small portion of the dataset (if possible, in some cases close to 100% of your set may need to be enhanced in some way).

5. Lighting should be consistent - some small shadows are OK but you shouldn't include content with harsh, directional lighting, if possible try to use only those where shadows are soft and light is diffused. For LIAE architectures it may not be as important as it can handle lighting better but for DF architectures it's important to have several lighting conditions for each face angle, preferably at least 3 (frontal diffuse, left and right with soft shadows, not too dark, details must still be visible in the shadowed area or no shadows, just diffused lighting that creates brigher areas on either left or right side of the face). Source faceset/dataset can contain faces of varying brightness but overly dark faces should not be included unless your DST is also dark.

6. If you are using only pictures or they are a majority of the dataset - make sure they fill all the checks as mentioned above, 20 or so pictures is not enough. Don't even bother trying to make anything with so little pictures.

7. Keep the total amount of faces in your source dataset around 3.000 - 8.000 - in some cases larger set may be required but I'd recommend to keep it under 12k for universal sets, 15k if really neecessary, larger sets then to produce more vague looking results, they also will take significantly longer to train but if your target video covers just about every imaginable angle then big SRC set willl be required to cover all those angles.

Now that you've colllected your source content it's time to extract frames from videos (photos don't need much more work but you can look through them and delete any blurry/low res pictures, black and white pictures, etc).

TIP: Re.gardles the method of extraction of frames you'll use prepare folders for all different sources in advance.

You can place them anywhere bu I like to place them in the workspace folder next to data_src, data_dst and model folders, name those folders according to the sources used (movie title, interview title, event or date for photos) and then place corresponding frames in them after extraction is done and then rename each set of frames so that it's clear where given faces came from.

These names get embedded into the faces after face extraction (step 4) so even if you then rename them (faces) or sort them, they retain original filename which you can restore using a .bat that you'll learn about in step 4.

You can extract frames in few different ways:

a) you extract each video separately by renaming each to as data_src (video should be in mp4 format but DFL uses FFMPEG so it potentialy should handle any format and codec) by using 2) Extract images from video data_src to extract frames from video file, after which they get outputted into "data_src" folder (it is created automatically), available options:

- FPS - skip for videos default frame rate, enter numerical value for other frame rate (for example entering 5 will only render the video as 5 frames per second, meaning less frames will be extracted), depending on length I recommend 5-10FPS for SRC frames extraction regardless of how you're extracting your frames (method b and c)

- JPG/PNG - choose the format of extracted frames, jpgs are smaller and have slightly lower quality, pngs are larger but extracted frames have better quality, there should be no loss of quality with PNG compared to original video.

b) you import all videos into a video editing software of your choice, making sure you don't edit videos of different resolutions together but instead process 720p, 1080p, 4K videos separetely, at this point you can also cut down videos to keep just the best shots that have the best quality faces, so shots where faces are far away/small, are blurry (out of focus, severe motion blur), are very dark or lit with single colored lighting or just that the lighting isn't very natural or has very bright parts and dark shadows at the same time as well as shots where majority of the face is obstrcuted should be discarded unless it's a very unique expression that doesn't occur often or it's at an angle that is also rarely found (such as people looking directly up/down) or if your target video actually has such stylized lighting, sometimes you just have to these lower quality faces if you can't find given angle anywhere else, next render the videos directly into either jpg or png frames into your data_src folder (create it manually if you deleted it before) and either render whole batch of videos at given resolution or render each clip separately.

c) use MVE and it's scene detection that does cuts for you, then use it to output just the scenes you selected into a folder at a specific frame rate and file format too and then also rename them so that all your aces have unique name that corresponds to the title of original video, very helpful in later stages, you can read more about MVE in this guide:

https://mrdeepfakes.com/forums/thread-mve-machine-video-editor-guide

3. Video cutting (optional): 3) cut video (drop video on me) - allows to quickly cut any video to desired length by dropping it onto that .bat file. Useful if you don't have video editing software and want to quickly cut the video, however with existence of MVE (which is free) it's usefullnes is questionable as it can only do a simply cut of a part of video from point A to B, cut the videos manually or use MVE.

STEP 4 - FRAMES EXTRACTION FROM TARGET VIDEO (DATA_DST.MP4):

You also need to extract frames from you target video, after you edited it the way you want it to be, render it as data_dst.mp4 and extract frames using 3) extract images from video data_dst FULL FPS, frames will be place into "data_dst" folder, available options are JPG or PNG format output - select JPG if you want smaller size, PNG for best quality. There is no frame rate option because you want to extract video at original framerate.

STEP 5 - DATA_SRC FACE/DATASET EXTRACTION:

Second stage of preparing SRC dataset is to extract faces from the extracted frames located inside "data_src" folder. Assuming you did rename all sets of frames inside their folders move them back into main "data_src" folder and run following 4) data_src faceset extract - automated extractor using S3FD algorithm, this will handle majority of faces in your set but is not perfect, it will fail to detect some faces and produce many false positives and detect other people which you will have to more or less manually delete.

There is also 4) data_src faceset extract MANUAL - manual extractor, see 5.1 for usage. You can use it to manuallly align some faces, especially if you havee some pictures or small source from movies that feature some rare angles that tend to be hard for the automated extractor (such as looking directlly up or down).

Available options for S3FD and MANUAL extractor are:

- Which GPU (or CPU) to use for extraction - use GPU, it's almost always faster.

- Face Type:

a) Full Face/FF - for FF models or lower face types (Half Face/Hf and Mid-Half Face/MF, rarely used nowadays).

b) Whole Face/WF - for WF models or lower, recommended as an universal/futureproof solution for working with both FF and WF models.

c) Head - for HEAD models, can work with other models like WF or FF but requires very high resolution of extraction for faces to have the same level of detail (sharpness) as lower coverage face types, uses 3D landmarks instead of 2D ones as in the FF and WF but is still compatible with models using those face types.

Remember that you can always change the facetype (and it's resolution) to lower one later using 4.2) data_src/dst util faceset resize or MVE (it can also turn lower res/facetype set into higher one but requires you to keep original frames and photos). Hence why I recommend using WF if you do primarlly face swaps with FF and WF models and HEAD for short haired sets used primarly for HEAD swaps but also ones that you may want at some point use for FF/WF face swaps.

- Max number of faces from image - how many faces extractor should extract from a frame, 0 is recommended value as it extracts as many as it can find. Selecting 1 or 2 will only extract 1 or 2 faces from each frame.

- Resolution of the dataset: This value will largerly depend on the resolution of your source frames, below are my personal recommendations depending on resolution of the source clip, you can of course use different values, you can even measure how big the biggest face in given source is and use that as a value (remember to use values in increments of 16).

Resolution can be also changed later by using 4.2) data_src/dst util faceset resize or MVE, you can even use MVE to extract faces with estimated face size option which will use landmark data from your extracted faces, original frames and re-extract your entire set again at the actual size each face is on the frames. You can read more about changing facetypes, dataset resolutions and more in those two MVE guides threads:

https://mrdeepfakes.com/forums/thread-how-to-fix-face-landmarks-with-mve

https://mrdeepfakes.com/forums/thread-mve-machine-video-editor-guide

I recommend following values for WF:

720p or lower resolution sources - 512-768
1080p sources - 768-1024
4K sources - 1024-2048

For HEAD extraction, add extra 256-512 just to be sure you aren't missing any details of the extracted faces or measure actual size of the head on a frame where it's closest to the camera. If in doubt, use MVE to extract faces with estimated face size option enabled.

- Jpeg quality - use 100 for best quality. DFL can only extract faces in JPG format. No reason to go lower than 100, size difference won't be big but quality will decrease dramatically resulting in worse quality.

- Choosing whether to generate "aligned_debug" images or not - can be generated afterwards, they're used to check if landmarks are correct however this can be done with MVE too and you can actually manually fix landmarks with MVE so in most cases this is not very useful for SRC datasets.

STEP 6 - DATA_SRC SORTING AND CLEANUP:
After SRC dataset extraction is finished next step is to clean the SRC dataset of false positives and incorrectly aligned faces. To help in that you can sort your faces, if it's a small set and has only a couple videos using the provided sorting methods should be more than enough, if you're working with a larger set, use MVE for sorting (check the guide for more info).

To perform sorting use 4.2) data_src sort - it allows you to sort your dataseet using various sorting algorithms, these are the available sort types:

[0] blur - sorts by image blurriness (determined by contrast), fairly slow sorting method and unfortunately not perfect at detecting and correctly sorting blurry faces.
[1] motion blur - sorts images by checking for motion blur, good for getting rid of faces with lots of motion blur, faster than [0] blur and might be used as an alternative but similarly to [0] not perfect.
[2] face yaw direction - sorts by yaw (from faces looking to left to looking right).
[3] face pitch direction - sorts by pitch (from faces looking up to looking down).
[4] face rect size in source image - sorts by size of the face on the original frame (from biggest to smallest faces). Much faster than blur.
[5] histogram similarity - sort by histogram similarity, dissimilar faces at the end, useful for removing drastically different looking faces, also groups them together.
[6] histogram dissimilarity - as above but dissimilar faces are on the beginning.
[7] brightness - sorts by overall image/face brightness.
[8] hue - sorts by hue.
[9] amount of black pixels - sorts by amount of completely black pixels (such as when face is cut off from frame and only partially visible).
[10] original filename - sorts by original filename (of the frames from which faces were extracted). without _0/_1 suffxes (assuming there is only 1 face per frame).
[11] one face in image - sorts faces in order of how many faces were in the original frame.
[12] absolute pixel difference - sorts by absolute difference in how image works, useful to remove drastically different looking faces.
[13] best faces - sorts by several factors including blur and removes duplicates/similar faces, has a target of how many faces we want to have after sorting, discard faces are moved to folder "aligned_trash".
[14] best faces faster - similar to best faces but uses face rect size in source image instead blur to determine quality of faces, much faster than best faces.

I recommend to start with simple histogram sorting [5], this will group faces together that look similar, this includes all the bad faces we want to delete so it makes the manual selection process much easier.

When the initial sorting has finished, open up your aligned folder, you can either browse it with windows explorer or use external app that comes with DFL which can load images much faster, to open it up use 4.1) data_src view aligned result.

After that you can do additional sorting by yaw and pitch to remove any faces that may look correct but that actually have bad landmarks.
Next you can sort by hue and brightnees to remove any faces that are heavilly tinted or very dark assuming you didn't already do that after historgram sorting.
Then you can use sort by blur, motion blur and face rect size to remove any blurry faces, faces with lots of motion blur and small faces. After that you should have relatively clean dataset.
At the end you can either sort them with any other method you want, order and filenames of SRC faces doesn't matter at all for anything, however I always recommend to restore original filenames but not with sorting option 10 but instead use - 4.2.other) data_src util recover original filename.

However if you have large dataset consisting on tens of interviews, thousands of high res pictures and many movies and tv show episodes you should consider a different approach to cleaning and sorting your sets.


Most of people who are serious about making deepfakes and are working on large, complex source sets rarely use just DFL for sorting and instead also use external free (for now) software called Machine Video Editor, or simply MVE. MVE comes with it's own sorting methods and can be used in pretty much all steps of making a deepfake video. That also includes the automated scene cutting and frames export for obtaining frames from your source content as mentioned in step 2 - SRC data collection), dataset enhancing, labeling/masking, editing landmarks and much more.

The thing to focus on here is the Nvidia Simillarity Sort option which works similarly to histogram sort but is a machine learning approach which groups faces together based on the identity, that way you get 99% faces of the person you want on the list in order and it's much faster to remove other faces (unwanted subjects you can either delete or keep for use with other facesets you may wish to make with those subjects in the future), it will often also group incorrect faces, faces with glasses, black and white faces or those with heavy tint together much more precisely and you get a face group preview where you can select or delete specific face groups and even cheeck which faces are in that group before you delete them but you also get to browse as you'd do in Windows Explorer or in XnViewMP.

For more info about MVE check available guides here: https://mrdeepfakes.com/forums/forum-guides-and-tutorials

MVE GITHUB: https://github.com/MachineEditor/MachineVideoEditor

MVE also has a discord server (SFW, no adult deeepfake talk allowed there), you can find link to it on the github page. There are additional guides on that server, watch them first before asking questions.

Regardless of whether you use MVE or DFL to sort the set there are few final steps you can perform at the end - DUPLICATE FACES REMOVAL:

First thing you can do on all of the remaining faces is to use software like visipics or dupeguru (or any other software that can detect similar faces and duplicates) to detect very similar and duplicated faces in your whole set of faces, the two I mentioned have adjustable "power" setting so you can only remove basically exactly the same faces or increase the setting and remove more faces that are very similar but be careful to not remove too many similar faces and don't actually delete them, for example in visipics you have the option to move them and that's what I recommend, create few folders for different levels of similarity (the slider works in steps so you can delete everything detected with strength 0-1 and move faces detected at strength levels 2-3 to different folders). This will slightly reduce face count which will speed up training and just make the sets less full of unnecessarily similar faces and duplicates.

Next (assuming you renamed frames before extraction) it's good to move faces into different folders based on where they came from:

Create a folder structure that suits you, I recommend following structure:
- main folders for movies, tv shows, interviews and photos (feel free to add additional categories based on type of footage)
- inside each of those, more folders for each individual source (for photos you can categorize based on photo type or by year or have it all in one folder)
- inside each individual folder for given source a folder for sharpest, best faces and what is left should be placed loosely in the base folder
- folder for all of the blurry faces you plan on enhancing/upscaling (more about it in ADVANCED below)
- folder for all of the upscaled faces
- folder for all of the duplicates
- and lastly the main folder which you can simply name aligned or main dataset where you combine best faces from all sources and upscaled faces.

Remember that all data is in the images themselves, you are free to move them to different folders, make copies/backups, archive them on different drives, computers, in general you are free move them outside of DFL. RAID is not a backup - 2-3 copies, cold storage, additional copies on different mediums in different locations. Backup new data at least once a week or two depending on how much data you end up creating, at worst just a few portable hard drives (ssd based are better obviously).

After you've done that you should have a bunch of folders in your "data_src" folder and your "aligned" folder should now be empty since you have moved all faces to different ones, delete it.

ADVANCED - SRC dataset enahcement.

You may want or need to improve quality and sharpness/level of detail of some of your faces after extraction. Some people upscale entire datasets while some only move blurry faces they want to upscale to separate folder and upscale part of the dataset that way, that can be done before making the final set (upscale all blurry faces regardless of whether you'd using them during training or not) or after making final set (upscaling only those faces you actually need for training). You should however only upscale what you really need, for example if you already have few high quallity interviews and want to upscale another one that has similar lighitng, expressions and angles then skip it, it's better to use content that's already sharp and good quality than upscaling everything for the sake of it. Upscale rare faces, rare expressions, angles that you don't have any sharp faces for.

First start by collecting all blurry faces you want to upscale and put them into a folder called "blurry" (example, name it however you want), next depending on upscaling method you may or may not have to save your metadata, some upscaling methods willl retrain this information but most won't hence why you need to do it. I also recommend to make a backup of your blurry folder in case some upscaling method you'd use would replace original images from the folder (most output to different folder). Rename your "blurry" as "aligned" and run:

4.2) data_src util faceset metadata save - saves embeded metadata of your SRC dataset in the aligned folder as meta.dat file, this is required if you're gonna be upscaling faces or doing any kind of editing on them like for example color correction (rotation or horizontal/vertical flipping is not allowed).

After you're done enahncing/upscaling/editing your faces you need to restore the metadata (in some cases), to do so rename your "upscaled" folder to "aligned" (or if you used Colab or did not upscale faces locally in general then simply copy them over to the new "aligned" folder), copy your meta.dat file from original "blurry" folder to the "aligned" folder and run: 4.2) data_src util faceset metadata restore - which will restore the metadata and now those faces are ready to be used.

If you forgot to save the metadata as long as you have the original blurry folder you can do so later, however if you've lost the original folder and now only have the upscaled results with no metadata only thing you can do is extract faces from those faces.

STEP 7 - DATA_DST FACE/DATASET EXTRACTION:

Here steps are pretty much the same as with source dataset, with few exceptions. Start by extracting faces from your DST frames using: 5) data_dst faceset extract - an automated face extractor utilizing S3FD face detection algorithm.

In addition to it you'll also notice other extraction methods, don't use them now but you need to make yourself familiar with them:

5) data_dst faceset extract MANUAL RE-EXTRACT DELETED ALIGNED_DEBUG - This one is also important, it is used to manualy re-extract missed faces after deleting their corresponding debug image from a folder "aligned_debug" that gets created along "aligned" folder after extraction, it is what makes it possible to get all faces to be swapped, more about it's use in step 5.1.
5) data_dst faceset extract MANUAL - manual extractor, see 5.1 for usage.
5) data_dst faceset extract + manual fix - S3FD + manual pass for frames where model didn't detect any faces, you can use this instead of 5) data_dst faceset extract - after initial exctraction finishes a window will open (same as with manual extraction or re-extraction) where you'll be able to extract faces from frames where extractor wasn't able to detect any faces, not even false positives, but this means extraction won't finish until you re-extract all faces so this is not recommended.

Simply use the first method for now.

Available options for all extractor modes are the same as for SRC except for lack of choice of how many faces we want to extract - it always tries to extract all, there is no choice for whether we want aligned_debug folder or not either, it is generated always since it's required for manual re-extraction.

STEP 8 - DATA_DST SORTING, CLEANUP AND FACE RE-EXTRACTION:

After we aligned data_dst faces we have to clean that set.

Run 5.2) data_dst sort - works the same as src sort, use [5] histogram similarity sorting option, next run 5.1) data_dst view aligned results - which willl allow you to view the contents of "aligned" folder using external app which offers quicker thumbnail generation than default windows explorer, here you can simply browse all faces and delete all bad ones (very small or large compared to others next to it as a result of bad rotation/scale caused by incorrect landmarks as well as false positives and other people's faces), after you're done run 5.2) data_dst util recover original filename - which works the same as one for source, it willl restore original filenames and order of all faces.

Next we have to delete debug frames so that we can use the manual re-extractor to extract faces from just the frames where the extractor couldn't properly extract faces, to do so run 5.1) data_dst view aligned_debug results - which willl allow you to quickly browse contents of "aligned_debug", here you check all debug frames to find those where the landmarks over our target person's face are placed incorrectly (not lining up with edges of the face, eyes, nose, mouth, eyebrows) or missing, those frames have to be deleted and this will tell the manual re-extractor which frames to show to you so that you can manually re-extract them. You can select all debug frames for deletion manually, however this means going through pretty much all of them by hand, it's easy to miss some frames that way, a better way to go about this is to take advantage of the fact your aligned folder (after you've cleaned it up) should now contain only good faces, use that to your advantage (you can make a copy of the debug folder, remove all good frames from it using faces from aligned folder, then go through what's left and use that to remove those bad frames from original debug folder). Once you're done deleting all the debug frames with missing/bad faces are deleted run 5) data_dst faceset extract MANUAL RE-EXTRACT DELETED ALIGNED_DEBUG to manually re-extract faces from corresponding frames.

Manual extractor usage:

Upon starting the manual extractor a window will open up where you can manually locate faces you want to extract and command line window displaying your progress:

- use yo.ur mouse to locate face
- use mouse wheel to change size of the search area (rect size, you saw this option in sorting, you can sort faces based on how big their rect size was during extraction)
- make sure all or at least most landmarks (in some cases depending on the angle, lighting or present obstructions it might not be possible to precisely align all landmarks so just try to find a spot that covers all. the visible bits the most and isn't too misaligned) land on important spots like eyes, mouth, nose, eyebrows and follow the face shape correctly, an up arrow shows you where is the "up" or "top" of the face
- use key A to change the precision mode, now landmarks won't "stick" so much to detected faces and you may be able to position landmarks more correctly, it will also run faster
- use < and > keys (or , and .) to move back and forwards, to confirm a detection either left mouse click and move to the next one or hit enter which both confirms selection and moves to the next face
- right mouse button for aligning undetectable forward facing or non human faces (requires applying xseg for correct masking)
- q to skip remaining faces, save the ones you did and quit extractor (it will also close down and save when you reach the last face and confirm it)

Now you should have all faces extracted but in some cases you will have to run it few times (the cases I mentioned above, reflections, split scenes, transitions). In that case rename your "aligned" folder to something else, then repeat the steps with renaming of aligned faces, copying them to a copy of "aligned_debug", replacing, deleting selected, removing remaining aside from those you need to extract from, copying that to original "aligned_debug" folder, replacing, deleting highlighter, running manual re-extractor again and then combining both aligned folders, making sure to not accidentallly replace some faces.

After you're done you have the same choice of additional .bats to use with your almost finished dst dataset:

5.2) data_dst util faceset pack and 5.2) data_dst util faceset unpack - same as with source, let's you quickly pack entire dataset into one file.

5.2) data_dst util faceset resize - works the same as one for SRC dataset.

But before you can start training you also have to mask your datasets, both of them.

STEP 9 - XSEG MODEL TRAINING, DATASET LABELING AND MASKING:

What is XSeg for? Some face types require an application of different mask than the default one that you get with the dataset after extraction, those default masks are derived from the landmarks and cover area similar to that of full face face type, hence why for full face or lower coverage face type XSeg is not required but for whole face and head it is. XSeg masks are also required to use Face and Background Style Power (FSP, BGSP) during training of SAEHD/AMP models regardless of the face type.

XSeg allows you to define how you want your faces to be masked, which parts of the face will be trained on and which won't.

It also is required to exclude obstructions over faces from being trained on and also so that after you merge your video a hand for example that is in front of the face is properly excluded, meaning the swapped face is masked in such way to make the hand visible and not cover it.

XSeg can be used to exclude just about every obstruction:
you have full control over what the model will think is a face and what is not.

Please make yourself familiar with some terms first, it's important you understand a difference between an XSeg model, dataset, label and mask:

XSeg model - user trainable model used to apply masks to SRC and DST datasets as well as to mask faces during the merging process.
XSeg label - a polygon that user draws on the face to define the face area and what is used by XSeg model for training.
XSeg mask - mask generated and applied to either SRC or DST dataset by a trained XSeg model.
XSeg dataset - a collection of labeled faces (just one specific type or both SRC and DST dataset, labeled in similar manner), these are often shared on the forum by users and are a great way to start making your own set since you can download one and either pick specific faces you need or add your own labeled faces to it that are labeled in similar manner.

Now that you know what each of those 4 things mean it's imporatnt you understand the main difference between labeling and masking SRC faces and DST faces.

Masks define which area on the face sample is the face itself and what is a background or obstruction, for SRC it means that whatever you include will be trained by the model with higher priority, whereas everything else willl be trained with lower priority (or precision). For DST it is the same but also you need to exclude obstructions so that model doesn't treat them as part of the face and also so that later when merging those obtructions are visible and don't get covered by the final predicted face (not to be mistaken with predicted SRC and predicted DST faces).

To use XSeg you have following .bats available for use:

5.XSeg) data_dst mask - edit - XSeg label/polygon editor, this defines how you want the XSeg model to train masks for DST faces.
5.XSeg) data_dst mask - fetch - makes a copy of labeled DST faces to folder "aligned_xseg" inside "data_dst".
5.XSeg) data_dst mask - remove - removes labels from your DST faces. This doesn't remove trained MASKS you apply to the set after training, it removes LABELS you manually created, I suggest renaming this option so it's on the bottom of the list or removing it to avoid accidental removal of labels.

5.XSeg) data_src mask - edit - XSeg label/polygon editor, this defines how you want the XSeg model to train masks for SRC faces.
5.XSeg) data_src mask - fetch - makes a copy of labeled SRC faces to folder "aligned_xseg" inside "data_src".
5.XSeg) data_src mask - remove - removes labels from your SRC faces. This doesn't remove trained MASKS you apply to the set after training, it removes LABELS you manually created, I suggest renaming this option so it's on the bottom of the list or removing it to avoid accidental removal of labels.

XSeg) train.bat - starts training of the XSeg model.

5.XSeg.optional) trained mask data_dst - apply - generates and applies XSeg masks to your DST faces.
5.XSeg.optional) trained mask data_dst - remove - removes XSeg masks and restores default FF like landmark derived DST masks.

5.XSeg.optional) trained mask data_src - apply - generates and applies XSeg masks to your SRC faces.
5.XSeg.optional) trained mask data_src - remove - removes XSeg masks and restores default FF like landmark derived SRC masks.

If you don't have time to label faces and train the model, you can use generic XSeg model that is included with DFL to quickly apply basic WF masks (may not exclude all obstructions) using following:

5.XSeg Generic) data_dst whole_face mask - apply - applies WF masks to your DST dataset.
5.XSeg Generic) data_src whole_face mask - apply - applies WF masks to your SRC dataset.

XSeg Workflow:

Step 1. Label your datasets.


Start by labeling both SRC and DST faces using 5.XSeg) data_src mask - edit and 5.XSeg) data_dst mask - edit

Each tool has a written description that's displayed when you go over it with your mouse (en/ru/zn languages are supported).

Label 50 to 200 different faces for both SRC and DST, you don't need to label all faces but only those where the face looks significantly different, for example:

- when facial expression changes (open mouth - closed mouth, big smile - frown)
- when direction/angle of the face changes
- or when lighting conditions/direction changes (usually together with face angle but in some cases the lighting might change while face still looks in the same direction)

The more various faces you label, the better quality masks Xseg model will generate for you. In general the smaller the dataset is the less faces will have to be labeled and the same goes about the variety of angles, if you have many different angles and also expressions it will require you to label more faces.

While labeling faces you will also probably want to exclude obstructions so that they are visible in the final video, to do so you can either:

- not include obstructions within the main label that defines face area you want to be swapped by drawing around it.
- or use exclude poly mode to draw additional label around the obstruction or part you want to be excluded (not trained on, visible after merging).

What to exclude:

Anything you don't want to be part of the face during training and you want to be visible after merging (not covered by the swapped face).

When marking obstructions you need to make sure you label them on several faces according to the same rules as when marking faces with no obstructions, mark the obstruction (even if it doesn't change appearance/shape/position when face/head:

- changes angle
- facial expression changes
- lighting conditions change

If the obstruction is additionally changing shape and/or moving across the face you need to mark it few times, not all obstructions on every face need to be labeled though but still the more variety of different obstructions occur in various conditions - the more faces you will have to label.

Label all faces in similar way, for example:

- the same approximated jaw line if the edge is not clearly visible, look at how faces are shaded to figure out how to correctly draw the line, same applies for face that are looking up, the part underneath the chin
- the same hair line (which means always excluding the hair in the same way, if you're doing full face mask and don't go over to the hairline the make sure the line you draw above eyebrows is always mostly at the same height above the eyebrows)

Once you finish labeling/marking your faces scroll to the end of the list and hit Esc to save them and close down the editor, then you can move on to training your XSEG model.

TIP:
You can use MVE to label your faces with it's more advanced XSeg editor that even comes with it's own trained segmentation (masking) model that can selectively include/exclude many parts of the face and even turn applied masks (such as from a shared XSeg model you downloaded or generic WF XSeg model that you used to apply masks to your dataset) back into labels, improve them and then save into your faces.

Step 2. Train your XSeg model.

When starting training for the first time you will see an option to select face type of the XSeg model, use the same face type as your dataset.
You will also be able to choose device to train on as well as batch size which will typically be much higher as XSeg model is not that demanding as training of the face swapping model (you can also start off at lower value and raise it later).
You can switch preview modes using space (there are 3 modes, DST training, SRC training and SRC+DST (distorted).
To update preview progress press P.
Esc to save and stop training.

During training check previews often, if some faces have bad masks after about 50k iterations (bad shape, holes, blurry), save and stop training, apply masks to your dataset, run editor, find faces with bad masks by enabling XSeg mask overlay in the editor, label them and hit esc to save and exit and then resume XSeg model training, when starting up an already trained model you will get a prompt if you want to restart training, select no as selecting yes will restart the model training from 0 instead of continuing. However in case your masks are not improving despite having marked many more faces and being well above 100k-150k iterations it might be necessary to label even more faces. Keep training until you get sharp edges on most of your faces and all obstructions are properly excluded.

Step 3. Apply XSeg masks to your datasets.

After you're done training or after you've already applied XSeg once and then fixed faces that had bad masks it's time for final application of XSeg masks to your datasets..

Extra tips:

1. Don't bother making 1000 point label, it will take too much time to label all the faces and won't affect the face vs if you use just 30-40 points to describe the face shape but also don't try to mark it with 10 points or the mask will not be smooth, the exception here would be marking hair for HEAD face type training where obviously some detail is needed to correctly resolve individual hair strands.
2. Do not mark shadows unless they're pitch black.
3. Don't mark out tongues or insides of the mouth if it's barely open.
4. If obstruction or face is blurry mark as much as needed to cover everything that should or shouldn't be visible, do not make offsets too big
5. Keep in mind that when you use blur the edge blurs both in and out, if you mark out a finger right on the edge it won't look bad on low blur but on higher one it will start to disappear and be replaced with the blurry version of what model learned, same goes for the mouth cavity, on low blur it will only show result face teeth but if you apply high blur then DST teeth will start to show and it will look bad (double teeth).

This means:

- when excluding obstructions like fingers - mark it on the edge or move the label few pixels away (but not too much). Both SRC and DST

- when excluding mouth cavity - remember to keep the label away from teeth unless it's the teeth in the back that are blurry and dark, those can be excluded. DST, SRC is optional, if you exclude the back teeth on SRC faces XSeg model will train to not include them so they won't be trained as precisely as the included front teeth, but as teeth in the back are usually quite blurry and dark or not visible at all it shouldn't affect your results much, especially if you will decide to exclude them on DST too, in that case you will only see back teeth of DST only anyway, similar rules apply when excluding tongues, mark them on the edge, keep an offset from teeth if the tongue is inside the mouth or touching upper or bottom teeth. Both SRC and DST, if you want tongue of SRC be trained don't exclude it on SRC faces but if you exclude it on DST then you won't see SRC tongue at all, I suggest excluding tongue only when mouth is wide open and only on DST and never on SRC faces.

Example of face with bad applied mask:

mMSrUGJh.jpg


Fixing the issue by marking the face correctly (you train XSeg model after that, just labeling it won't make the model better):

e354c5Vh.jpg


How to use shared marked faces to train your own XSeg model:

Download, extract and place faces into "data_src/aligned" or "data_dst/aligned". Make sure to rename them to not overwrite your own faces (I suggest XSEGSRC and XSEGDST for easy removal afterwards).
You can mix shared faces with your own labeled to give the model as much data to learn masks as possible, don't mix face types, make sure all faces roughly follow the same logic of masking.
Then just start training your XSeg model (or shared one).

How to use shared XSeg model and apply it to your dataset:

Simply place it into the "model" folder and use apply .bat files to apply masks to SRC or DST.

After you apply masks open up XSeg editor and check how masks look by enabling XSeg mask overlay view, if some faces don't have good looking masks, mark them, exit the editor and start the training of the XSeg model again to fix them. You can also mix in some of the shared faces as described above (how to use shared marked faces). You can reuse XSeg models (like SAEHD models).

User shared SAEHD models can be found in the model sharing forum section:
10. Training SAEHD/AMP:
If you don't want to actually learn what all the options do and only care about a simple workflow that should work in most cases scroll down to section 6.1 - Common Training Workflows.

WARNING:
there is no one right way to train a model, learn what all the options do, backtrack the guide to earlier steps if you encouter issues during training (masking issues, blurry/distorted faces with artifacts due to bad quality SRC set or lack of angles/expressions, bad color matching due to low variety of lighting conditions in your SRC set, bad DST alignments, etc).

There are currently 3 models to choose from for training:

SAEHD (6GB+):
High Definition Styled Auto Encoder - for high end GPUs with at least 6GB of VRAM. Adjustable. Recommended for most users.

AMP (6GB+):
New model type, uses different architectur, morphs shapes (attempts to retain SRC shape), with adjustable morphing factor (training and merging) - for high end GPUs with at least 6GB of VRAM. Adjustable. AMP model is still in development, I recommend you learn making deepfakes with SAEHD first before using AMP. For AMP workflow scroll down to section 6.2.

Quick96 (2-4GB):
Simple mode dedicated for low end GPUs with 2-4GB of VRAM. Fixed parameters: 96x96 Pixels resolution, Full Face, Batch size 4, DF-UD architecture. Primarly used for quick testing.

Model settings spreadsheet where you can check settings and performance of models running on various hardware: https://mrdeepfakes.com/forums/threads/sharing-dfl-2-0-model-settings-and-performance.4056/

To start trainign process run one of these:

6) train SAEHD
6) train Quick96
6) train AMP SRC-SRC

6) train AMP

You may have noticed that there are 2 separate training executables for AMP, ignore those for now and focus on learning SAEHD workflow first.

Since Quick96 is not adjustable you will see the command window pop up and ask only 1 question - CPU or GPU (if you have more then it will let you choose either one of them or train with both).
SAEHD however will present you with more options to adjust as will AMP since both models are fully adjustable.
In both cases first a command line window will appear where you input your model settings.
On a first start will you will have access to all setting that are explained below, but if you are using existing pretrained or trained model some options won't be adjustable.
If you have more than 1 model in your "model" folder you'll also be prompted to choose which one you want to use by selecting corresponding number
You will also always get a prompt to select which GPU or CPU you want to run the trainer on.
After training starts you'll also see training preview.

Here is a detailed explanation of all functions in order (mostly) they are presented to the user upon starting training of a new model.

Note that some of these get locked and can't be changed once you start training due to way these models work, example of things that can't be changed later are:

- model resolution (often shortended to "res")
- model architecture ("archi")
- models dimensions ("dims")
- face type
- morph factor (AMP training)


Also not all options are available for all kinds of models:
For LIAE there is no True Face (TF)
For AMP there is no architecture choice or eye and mouth priority (EMP)
As the software is developed more options may become available or unavailable for certain models, if you are on newest version and notice lack of some option that according to this guide is still available or notice lack of some options explained here that are present please message me via private message or post a message in this thread and I'll try to update the guide as soon as possible.

Autobackup every N hour ( 0..24 ?:help ) : self explanatory - let's you enable automatic backups of your model every N hours. Leaving it at 0 (default) will disable auto backups. Default value is 0 (disabled).

[n] Write preview history ( y/n ?:help ) : save preview images during training every few minutes, if you select yes you'll get another prompt: [n] Choose image for the preview history ( y/n ) : if you select N the model will pick faces for the previews randomly, otherwise selecting Y will open up a new window after datasets are loaded where you'll be able to choose them manually.

Target iteration : will stop training after certain amount of iterations is reached, for example if you want to train you model to only 100.000 iterations you should enter a value of 100000. Leaving it at 0 will make it run until you stop it manually. Default value is 0 (disabled).

[n] Flip SRC faces randomly ( y/n ?:help ) : Randomly flips SRC faces horizontally, helps to cover all angles present in DST dataset with SRC faces as a result of flipping them which can be helpful sometimes (especially if our set doesn't have many different lighting conditons but has most angles) however in many cases it will make results seem unnatural becasue faces are never perfectly symmetric, it will also copy facial features from one side of the face to the other one, they may then appear on either sides or on both at the same time. Recommended to only use early in the training or not at all if our SRC set is diverse enough. Default value is N.

[y] Flip DST faces randomly ( y/n ?:help ) : Randomly flips DST faces horizontally, can improve generalization when Flip SRC faces randomly is diabled. Default value is Y.

Batch_size ( ?:help ) : Batch size settings affects how many faces are being compared to each other every each iteration. Lowest value is 2 and you can go as high as your GPU will allow which is affected by VRAM. The higher your models resolution, dimensions and the more features you enable the more VRAM will be needed so lower batch size might be required. It's recommended to not use value below 4. Higher batch size will provide better quality at the cost of slower training (higher iteration time). For the intial stage it can be set lower value to speed up initial training and then raised higher. Optimal values are between 6-12. How to guess what batch size to use? You can either use trial and error or help yourself by taking a look at what other people can achieve on their GPUs by checking out DFL 2.0 Model Settings and Performance Sharing Thread.

Resolution ( 64-640 ?:help ) : here you set your models resolution, bear in mind this option cannot be changed during training. It affects the resolution of swapped faces, the higher model resolution - the more detailed the learned face will be but also training will be much heavier and longer. Resolution can be increased from 64x64 to 640x640 by increments of:

16 (for regular and -U architectures variants)
32 (for -D and -UD architectures variants)

Higher resolutions might require increasing of the model dimensions (dims) but it's not mandatory, you can get good results with default dims and you can get bad results with very high dims, in the ends it's the quality of your source dataset that has the biggest impact on quality so don't stress out if you can't run higher dims with your GPU, focus on creating a good source set, worry about dims and resolution later.

Face type ( h/mf/f/wf/head ?:help ) : this option let's you set the area of the face you want to train, there are 5 options - half face, mid-half face, full face, whole face and head:

a) Half face (HF) - only trains from mouth to eybrows but can in some cases cut off the top or bottom of the face (eyebrows, chin, bit of mouth).
b) Mid-half face (MHF) - aims to fix HF issue by covering 30% larger portion of face compared to half face which should prevent most of the undesirable cut offs from occurring but they can still happen.
c) Full face (FF) - covers most of the face area, excluding forehead, can sometimes cut off a little bit of chin but this happens very rarely (only when subject opens mouth wide open) - most recommended when SRC and/or DST have hair covering forehead.
d) Whole face (WF) - expands that area even more to cover pretty much the whole face, including forehead and all of the face from the side (up to ears, HF, MHF and FF don't cover that much).
e) Head (HEAD) - is used to do a swap of the entire head, not suitable for subjects with long hair, works best if the source faceset/dataset comes from single source and both SRC and DST have short hair or one that doesn't change shape depending on the angle.
Examples of faces, front and side view when using all face types: [IMAGE MISSING, WORK IN PROGRESS]

Architecture (df/liae/df-u/liae-u/df-d/liae-d/df-ud/liae-ud ?:help ) :
This option let's you choose between 2 main architectures of SAEHD model: DF and LIAE as well as their variants:

DF: This model architecture provides better SRC likeness at the cost of worse lighting and color match than LIAE, it also requires SRC set to be matched to all of the angles and lighting of the DST better and overall to be made better than a set that might be fine for LIAE, it also doesn't deal with general face shape and proportions mismatch between SRC and DST where LIAE is better but at the same time can deal with greater mismatch of actual appearance of facial features and is lighter on your GPU (lower VRAM usage), better at frontal shots, may struggle more at difficult angles if the SRC set doesn't cover all the required angles, expressions and lighting conditions of your DST.

LIAE: This model is almost complete opposite of DF, it doesn't produce faces that are as SRC like compared to DF if the facial features and general appearance of DST is too different from SRC but at the same time deals with different face proportions and shapes better than DF, it also creates faces that match the lighting and color of DST better than DF and is more forgiving when it comes to SRC set but it doesn't mean it can create a good quality swap if you are missing major parts of the SRC set that are present in the DST, you still need to cover all the angles. LIAE is heavier on GPU (higher VRAM usage) and does better job at more complex angles.
Also make sure you read "Extra training and reuse of trained LIAE/LIAE RTM models - Deleting inter_ab and inter_b files explained:" in Step 10.5 for how to deal with LIAE models when reusing them.

Keep in mind that while these are general characteristics of both architectures it doesn't mean they will always behave like that, incorrectly trained DF model can have worse resembalnce to SRC than correctly trained LIAE model and you can also completely fail to create anything that looks close to SRC with LIAE and achieve near perfect color and lighting match with DF model. It all comes down to how well matched your SRC and DST is and how well your SRC set is made which even if you know all the basics can still take a lot of trial and error.

Each model can be altered using flags that enable variants of the model architectures, they can also be combined in the order as presented below (all of them affect performance and VRAM usage):

-U: this variant aims to improve similarity/likeness to the source faces and is in general recommended to be used always.
-D: this variant aims to improve quality by roughly doubling possible resolution with no extra compute cost, however it requires longer training, model must be pretrained first for optimal results and resolution must be changed by the value of 32 as opposed to 16 in other variants. In general it too should be always used because of how much higher reslution model this architecture allows but if you have access to extremely high vram setup it might be worth to experiment with training models without it as that might yield in higher quality results.
-T: this variant changes the model architecture in a different way than -U but with the same aim - to create even more SRC like results however it can affect how sharp faces are as it tends to cause slight loss of detail compared to just using -D/-UD variants. Recommended for LIAE only.
-C: experimental variant, switches the activation function between ReLu and Leaky ReLu (use at your own risk).

To combine architecture variants after DF/LIAEwrite a "-" symbol and then letters in the same order as presented above, examples: DF-UDTC, LIAE-DT, LIAE-UDT, DF-UD, DF-UT, etc

DF vs LIAE vs AMP comparison (SFW):
The next 4 options control models neural network dimensions which affect models ability to learn, modifying these can have big impact on performance and quality:

Auto Encoder Dims ( 32-2048 ?:help ) :
Auto encoder dims setting, affects overall ability of the model to learn faces.
Inter Dims ( 32-2048 ?:help ) : Inter dims setting, affects overall ability of the model to learn faces, should be equal or higher than Auto Encoder dims, AMP ONLY.
Encoder Dims ( 16-256 ?:help ) : Encoder dims setting, affects ability of the encoder to learn faces.
Decoder Dims ( 16-256 ?:help ) : Decoder dims setting, affects ability of the decoder to recreate faces.
Decoder Mask Dims ( 16-256 ?:help ) : Mask decoder dims setting, affects quality of the learned masks. May or may not affect some other aspects of training.

The changes in performance when changing each setting can have varying effects on performance and it's not easy to measure effect of each one on performance and quality without extensive testing.

Each one is set at certain default value that should offer optimal results and good compromise between training speed and quality.

Also when changing one parameter the other ones should be changed as well to keep the relations between them similar, that means raising AE dims, E and D dims should be also raised and D Mask dims can be raised but it's optional and can be left at default or lowered to 16 to save some VRAM at the cost of lower quality of learned masks (not the same as XSeg masks, these are masks model learns during training and they help the model to train the face area efficiently, if you have XSeg applied those learned masks are based of shape of your XSeg masks, otherwise the default FF landmarks derived masks are learned upon). It's always best to raise them all when you're training higher model resolutions because it makes the model capable of learning more about the face which at higher resolution means potentially more expressive and realistic face with more detail captured from your source dataset and better reproduction of DST expressions and lighting.

Morph factor ( 0.1 .. 0.5 ?:help ) : Affects how much the model will morph your predicted faces to look and express more like your SRC, typical and recommended value is 0.5. (I need to test this personally, didn't use AMP yet so don't know if higher or lower value is better).

Masked training ( y/n ?:help ) : Prioritizes training of what's masked (default mask or applied xseg mask), available only for WF and HEAD face types, disabling it trains the whole sample area (including background) at the same priority as the face itself. Default value is y (enabled).

Eyes and mouth priority ( y/n ?:help ) : Attempts to fix problems with eyes and mouth (including teeth) by training them at higher priority, can improve their sharpness/level of detail too.

Uniform_yaw ( y/n ?:help ) : Helps with training of profile faces, forces model to train evenly on all faces depending on their yaw and prioritizes profile faces, may cause frontal faces to train slower, enabled by default during pretraining, can be used while RW is enabled to improve generalization of profile/side faces or when RW is disabled to improve quality and sharpness/detail of those faces. Useful when your source dataset doesn't have many profile shots. Can help lower loss values. Default value is n (disabled).

Blur our mask ( y/n ?:help ) : Blurs area outside of the masked area to make it more smoother. With masked training enabled, background is trained with lower priority than face area so it's more prone to artifacts and noise, you can combine blur out mask with background style power to get background that is both closer to background of DST faces and also smoother due to the additional blurring this option provides. The same XSeg model must be used to apply masks to both SRC and DST dataset.

Place models and optimizer on GPU ( y/n ?:help ) : Enabling GPU optimizer puts all the load on your GPU which greatly improves performance (iteration time) but will lead to higher VRAM usage, disabling this feature will off load some work of the optimizer to CPU which decreases load on GPU and VRAM usage thus letting you achieve higher batch size or run more demanding models at the cost of longer iteration times. If you get OOM (out of memory) error and you don't want to lower your batch size or disable some feature you should disable this feature and thus some work will be offloaded to your CPU and you will be able to run your model without OOM errors at the cost of lower speed. Default value is y (enabled).

Use AdaBelief optimizer? ( y/n ?:help ) : AdaBelief (AB) is a new model optimizer which increases model accuracy and quality of trained faces, when this option is enabled it replaces the default RMSProp optimizer. However those improvements come at a cost of higher VRAM usage. When using AdaBelief LRD is optional but still recommended and should be enabled (LRD) before running GAN training. Default value is Y.

Personal note: Some people say you can disable Adabelief on existing model and it will retrain fine, I don't agree with this completely and think the model never recoveres perfectly and forgets too much when you turn it on or off so I suggest to just stick with it being either enabled or disabled. Same for LRD, some people say it's optional, some that it's still necessary, some say it's not necessary, I still use it with AB, some people may not use it, draw conclusions yourself from the DFL's built in description.

Use learning rate dropout ( y/n/cpu ?:help ) : LRD is used to accelerate training of faces and reduces sub-pixel shake (reduces face shaking and to some degree can reduce lighting flicker as well).
It's primarly used in 3 cases:
- before disabling RW, when loss values aren't improving by a lot anymore, this can help model to generalize faces a bit more
- after RW has been disabled and you've trained the model well enough enabling it near the end of training will result in more detailed, stable faces that are less prone to flicker
This option affects VRAM usage so if you run into OOM errors you can run it on CPU at the cost of 20% slower iteration times or just lower your batch size.
For more detailed explanation of LRD and order of enabling main features during training please refer to FAQ Question 8

Enable random warp of samples ( y/n ?:help ) : Random warp is used to generalize a model so that it correctly learns face features and expressions in the initial training stage but as long as it's enabled the model may have trouble learning the fine detail - because of it it's recommended to keep this feature enabled as long as your faces are still improving (by looking at decreasing loss values and faces in the preview window improving) and once all look correct (and loss isn't decreasing anymore) you should disable it to start learning details, from then you don't re-enable it unless you ruin the results by applying to high values for certain settings (style power, true face, etc) or when you want to reuse that model for training of new target video with the same source or when reusing with combination of both new SRC and DST, you always start training with RW enabled. Default value is y (enabled).

Enable HSV power ( 0.0 .. 0.3 ?:help ) : Applies random hue, saturation and brightness changes to only your SRC dataset during training to improve color stability (reduce flicker) and may also affect color matching of the final result, this option has an effect of slightly averaging out colors of your SRC set as the HSV shift of SRC samples is based only on color information from SRC samples and it can be combined with color transfer (CT), power (quality) of which this option reduces or used without it if you happen to get better results without CT but need to just make the colors of the resulting face sligtly more stable and consistent, requires your SRC dataset to have lots of variety in terms of lighting conditions (direction, strenght and color tone), recommended value is 0.05.

GAN power ( 0.0 .. 10.0 ?:help ) : GAN stands for Generative Adversarial Network and in case of DFL 2.0 it is implemented as an additional way of training to get more detailed/sharp faces. This option is adjustable on a scale from 0.0 to 10.0 and it should only be enabled once the model is more or less fully trained (after you've disabled random warp of samples and enabled LRD). It's recommended to use low values like 0.01. Make sure to backup your model before you start training (in case you don't like results, get artifcats or your model collapses). Once enabled two more settings will be presented to adjust internal parameters of GAN:

[1/8th of RES] GAN patch size ( 3-640 ?:help ) : Improves quality of GAN training at the cost of higher VRAM usage, default value is 1/8th of your resolution.

[16] GAN dimensions ( 4-64 ?:help ) : The dimensions of the GAN network. The higher the dimensions, the more VRAM is required but it can also improve quality, you can get sharp edges even at the lowest setting and because of thise default value of 16 is recommended but you can reduce it to 12-14 to save some performance if you need to.

Before/after example of a face trained with GAN at value of 0.1 for 40k iterations:

[img=650x400]

'True face' power. ( 0.0000 .. 1.0 ?:help ) : True face training with a variable power settings let's you set the model discriminator to higher or lower value, what this does is it tries to make the final face look more like src, as a side effect it can make faces appear sharper but can also alter lighting and color matching and in extreme cases even make faces appear to change angle as the model will try to generate face that looks closer to the training sample, as with GAN this feature should only be enabled once random warp is disabled and model is fairly well trained. Consider making a backup before enabling this feature. Never use high values, typical value is 0.01 but you can use even lower ones like 0.001. It has a small performance impact. Default value is 0.0 (disabled).
[img=500x200]

Face style power ( 0.0..100.0 ?:help ) and Background style power ( 0.0..100.0 ?:help ) : This setting controls style transfer of either face (FSP) or background (BSP) part of the image, it is used to transfer the color information from your target/destination faces (data_dst) over to the final predicted faces, thus improving the lighting and color match but high values can cause the predicted face to look less like your source face and more like your target face. Start with small values like 0.001-0.1 and increase or decrease them depending on your needs. This feature has impact on memory usage and can cause OOM error, forcing you to lower your batch size in order to use it. For Background Style Power (BSP) higher values can be used as we don't care much about preserving SRC backgrounds, recommended value by DFL for BSP is 2.0 but you can also experiment with different values for the background. Consider making a backup before enabling this feature as it can also lead to artifacts and model collapse.
Default value is 0.0 (disabled).

Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : this features is used to match the colors of your data_src to the data_dst so that the final result has similar skin color/tone to the data_dst and the final result after training doesn't change colors when face moves around, commonly reffered to as flickering/flicker/color shift/color change (which may happen if various face angles were taken from various sources that contained different light conditions or were color graded differently). There are several options to choose from:

- none: because sometimes less is better and in some cases you might get better results without any color transfer during training.
- rct (reinhard color transfer): based on: https://www.cs.tau.ac.il/~turkel/imagepapers/ColorTransfer.pdf
- lct (linear color transfer): Matches the color distribution of the target image to that of the source image using a linear transform.
- mkl (Monge-Kantorovitch linear): based on: http://www.mee.tcd.ie/~sigmedia/pmwiki/uploads/Main.Publications/fpitie07b.pdf
- idt (Iterative Distribution Transfer): based on: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.158.1052&rep=rep1&type=pdf
- sot (sliced optimal transfer): based on: https://dcoeurjo.github.io/OTColorTransfer/

Most color transfers have little to no affect on performance or VRAM usage with exception of SOT which has performance effect during training and can severly slow down the merging process if used during merging, other color transfers like IDT may also have a performance impact during merging.

Using color transfers is not always required but quite often helps and in some cases is absolutely mandatory, you should remember also that enabling them acts as an augmentation of the set, effectively creating new conditions for all of the SRC samples, thus increasing the complexity of the training data which can result in higher loss when enabled and naturally will mean the model will have to be trained longer to achieve the same state compared to training without color transfer where faces never change colors that much. This option can be combined with Random HSV Power which provides additional augmentation of the SRC set based of colors of just SRC set alone (unlike CT which augments SRC based of DST), effectively slightly averaging it's colors, providing additional color conditions CT methods may not achieve and it also reduces the effect of CT slightly (referred to as CT quality reduction by iperov in official notes).

Enable gradient clipping ( y/n ?:help ) : This feature is implemented to prevent so called model collapse/corruption which may occur when using various features of DFL 2.0. It has small performance impact so if you really don't want to use it you must enable auto backups as a collapsed model cannot recover and must be scraped and training must be started all over. Default value is n (disabled) but since the performance impact is so low and it can save you a lot of time by preventing model collapse if you leave it enabled. Model collapse is most likely to happen when using Style Powers so if you're using them it's highly advised to enable gradient clipping or backups (you can also do them manually).

Enable pretraining mode ( y/n ?:help ) : Enables pretraining process that uses a dataset of random people to initially pretrain your model, after training it to anywhere from 500.000 to 1.000.000 iterations such model can be then used when starting training with actual data_src and data_dst you want to train, it saves time because the model will already know how faces should look like and thus make it so it takes less time for faces to appear clearly when training (make sure to disable pretrain when you train on your actual data_src and data_dst). Models using -D architecture variants must be pretrained and it's also highly recommended to pretrain all models.
User shared SAEHD models can be found in this thread: https://mrdeepfakes.com/forums/thre...eral-thread-for-user-made-models-and-requests

1. What are pretrained models?

Pretrained models are made by training them with random faces of various people. Using a model prepared in such a way significantly speeds up the initial training stage because model already knows how face should look so you don't have to wait as much for faces to start showing up and they'll become sharp faster compared to training on a fresh and non-pretrained model.
You can now also share your custom pretraining sets (SFW/NSFW) for various face_types (full face, whole face and head).

2. How to use pretrained models?

Simply download it and place all the files directly into your model folder, start training, after selecting model for training (if you have more than one in your model folder) and device to train with (GPU/CPU) press any key within 2 seconds (you'll see a prompt that says this exact thing) to override model settings and make sure the pretrain option is set to disabled (N) so that you start training and not continue pretraining.
If you leave pretrain option enabled (Y) the model will continue to pretrain using built-in pretrain dataset that comes with DFL (in this thread you will find models trained with both the old full face pretrain dataset set as well as with the new whole face FFHQ dataset).
Note that the model will revert iteration count to 0 when you disable pretrain and start regular training, that's normal behavior for pretrained models. However if the model is described as "regular training" this means it was not pretrained but instead trained to certain amount of iterations where both SRC and DST dataset contained random faces of people, in this case model will carry on training and iteration count won't start at 0 but at the value it was when training was ended by the user who is sharing the model.

3. How to create your own pretrained model?

1. The official and recommended way to create one is to use pretrain option which will use DFL's built-in random celebrity faces dataset and train your model like this for 500k-1kk iterations.

After model is sufficiently trained (most faces in the preview should look sharp by then, with well defined teeth, eyes but not necessarily with a lot of very fine detail).

1.1 You can also change the default pretrain dataset to your own which you can make by placing random faces of people you're most likely to fake (it can be all male, female, mix of male and female, celebrities only, random people) and then packing it using util faceset pack.bat and then replacing the original file in \_internal\pretrain_CelebA with this new dataset.

2. Alternative way to pretrain a model is to prepare data_src and data_dst datasets with faces of random people, from various angles and with different expressions and train models as if you would normally (pretrain disabled). For source dataset you can use faces of celebs you are most likely to swap in the future and for DST you can use any faces from types of videos you're most likely to use as your target videos.

It should be noted however that preparing your model by simply training it on random faces can introduce some morphing and make result faces look slightly less like the source for a while. However after few retrains using the same source the src likeness of the predicted faces should improve. This method can be faster to adapt to new faces compared to training on pretrained model (because we are simply reusing a model, but instead of reusing one that was trained on specific src dataset we reuse a model that contains random faces, as mentioned above you can include faces of people you're most likely to fake as a part of your src and dst datasets).

NOTE: If you're pretraining a HEAD model consider using your custom pretrain set as the included FFHQ dataset is of Whole Face type (WF). It's is strongly recommended to pretrain HEAD and any kind of AMP models, FF and WF SAEHD models are optional but it still helps to pretrain them to at least 300-500k and then use that as a base for you future project or do extra 500-600k of random training on top of your pretrain with SAEHD models.

10.1 RTM Training Workflow:
With introduction of DeepFaceLive (DFLive) a new training workflow has been established, contrary to what some users think this isn't a new training method and does not differ significantly from regular training and this training method has been employed by some people in one way or another, you may have yourself create one by accident without even realizing it.

RTM models (ReadyToMerge)
are created by training an SRC set of the person we want to swap against large and varied DST set containing random faces of many people which covers all possible angles, expressions and lighting conditions. The SRC set must also have large variety of faces. The goal of RTM model training is to create a model that can apply our SRC face to any video, primarly for use with DeepFaceLive but also to speed up training process within DeepFaceLab 2.0 by creating a base model that can very quickly adapt to new target videos in less time compared to training a model from scratch.

The recommended type of models for use with RTM workklow are SAEHD LIAE models, LIAE-UD or LIAE-UDT thanks to their superior color and lighting matching capabilities as well as being able to adapt better to different face shapes than DF architecture.
AMP models can also be used to create RTM models, although they work a bit differently and as I lack know-how to explain AMP workflow yet I will only focus on LIAE RTM model training in this part of the guide.

1. Start by preparing SRC set: make sure you cover all possible angles, each with as many different lighting conditions and expressions, the better the coverage of different possible faces, the better results will be.

2. Prepare a DST set by collecting many random faces: this dataset must also have as much variety as possible, this dataset can be truly random, consisting of both masculine and femine faces of all sorts of skin colors or it can be specific to for example black masucline faces or feminine asian faces if that's the type of target face you plan on primarly use the model with, the more variety and more faces in the set the longer it will take to train a model but possibly better the model will be as it will be able to more correctly swap to more kinds of different faces.

ALTERNATIVELY - USE RTM WF dataset from iperov: https://tinyurl.com/2p9cvt25
If the link is dead go to https://github.com/iperov/DeepFaceLab and find torrent/magnet link to DFL builds as they contain the RTM WF dataset along them, the same dataset can be used to train an RTT model.

3. Apply XSeg masks to both datasets: this will ensure model correctly trains and as with any other training is require in order to create WF model and while it's optional for FF models it's still recommended to apply XSeg mask of the correct type to both datasets, make sure you use the same XSeg model for both datasets.

4. Use and existing RTT model or create a new one: RTT models as recommended by iperov are heavily re-trained models, upwards of 2-3kk iterations, hence why it may take a lot of time to create them, as an alternative you can pretrain a LIAE model for 600k-1kk iterations, more about making RTT in the next stage.

5. Start training on your SRC and random DST using the workflows below, do note that some of these have been modified slightly compared to the iperov ones, use at your own risk.

5.1 Iperov's new workflow:


Settings: EMP Enabled, Blur Out Mask Enabled, UY Enabled, LRD Enabled, BS:8 (if you can't run your model with high enough BS lower it or run model optimizer and lrd on cpu).
Others options should be left at default values (usually means disabled). Optionally use HSV at power 0.1 and CT mode that works best for you, usually RCT.
Make a backup before every stage or enable auto backups.

1. Train +2.000.000 iters with RW enabled and delete inter_AB.npy every 500k iters (save and stop model training, delete the file and resume training)
2. After deleting inter_AB 4th time train extra +500k with RW enabled.
3. If swapped face looks more like DST, delete inter_AB and repeat step 2.
4. Disable RW and train for additional +500k iters.
5. Enable GAN at power 0.1 with GAN_Dims:32 and Patch Size being 1/8th of your model resolution for +800.000k iters.

5.2 Iperov's old workflow:

1. Do 500k-1kk iterations with Random Warp: Y, Uniform Yaw: Y, LRD: N, Blur Out Mask: Y, Color Transfer: LCT, other settings leave at default values.
2. Next do 500k iterations with LRD: Y, keep other settings as they are in step 1.
3. After that do 500k iterations with Uniform Yaw: N
4. Now do 500-800k iterations with Random Warp: N ,Uniform Yaw: N ,LRD: Y *
5. And lastly do 200-300k iterations with Random Warp: N ,Uniform Yaw: N ,LRD: Y and GAN: 0.1, GAN PATCH SIZE: (1/8th of model resolution), GAN DIMS: 32

10.2 Using RTM models:

Once you've finished training your models you can either use them in DFL or export as DFM model for use in DFLive.

To export a model for use in DFLive use 6) export SAEHD as dfm or 6) export AMP as dfm, you'll have the choice of quantizing the model which can make it run faster but some models, particularly large ones with high resolution and high network dimensions (dims) values may not work well if you export them with this option enabled so make sure you test it in DFLive, the process doesn't delete original models, only creates additional DFM file in your "model" folder. If your model doesn't work well export it again with quantize option disabled.

If you want to use your RTM model in DFL you can either start extracting new scenes and merge them with this model without any additional training or do some extra training.

Extra training and reuse of trained LIAE/LIAE RTM models - Deleting inter_ab and inter_b files explained:

What are inter_ab and _b files? These are parts of SAEHD models that use LIAE architecture (regardless of additional -U, -D, -T and -C variations), unlike DF architecture which has one commong inter file for both SRC and DST, LIAE features two inter files, inter_ab which contains latent code (representation) of both SRC and DST faces and additional inter_b that contains latent code of DST faces.

1. Delete inter_b file from your model folder when you want to reuse RTM model as a regular LIAE model on new DST and train the model all over starting with RW enabled (train as a regular model).
Applies to reusing trained LIAE model and changing DST but not SRC.

2. Delete inter_ab file when you want to create a new RTM model for different celebrity, replace SRC with new one, add random DST set and proceed with the same workflow as when creating new RTM model.
Applies to reusing trained LIAE model and changing SRC but not DST

3. Don't delete either inter_ab or inter_b when you want to perform additional training on target DST using your trained RTM model.
Doesn't apply to regular trained LIAE model reusal (may run into issue where your final predicted faces looks just like DST or have very low resemblance to SRC)

4. Delete both inter_ab and inter_b
when you are reusing trained LIAE model in regular scenarios, both src/dst change or you run into issue where results look like DST, do note that this has similar effect to what happens when you disable pretraining, only encoders/decoders remain trained, all other data is removed, this means model in a way returns to a state as if it was just pretrained (not quite but closer to that than a trained state) which may cause training to take a bit longer.

In that case simply replace random DST with specific taget DST, start training with RW disabled:

If you want to train using old iperov workflow start at step 4 of the OLD WORKFLOW.
If you want to to train using new iperov workflow start at step 4 of the NEW WORKFLOW.

RTM models sharing thread: https://mrdeepfakes.com/forums/thread-sharing-dfl-2-0-readytomerge-rtm-models-sharing

11. Merging:

After you're done training your model it's time to merge learned face over original frames to form final video.

For that we have 3 converters corresponding to 3 available models:

7) merge SAEHD
7) merge AMP
7) merge Quick96


Upon selecting any of those a command line window will appear with several prompts.
1st one will ask you if you want to use an interactive converter, default value is y (enabled) and it's recommended to use it over the regular one because it has all the features and also an interactive preview where you see the effects of all changes you make when changing various options and enabling/disabling various features
Use interactive merger? ( y/n ) :

2nd one will ask you which model you want to use:
Choose one of saved models, or enter a name to create a new model.
[r] : rename
[d] : delete
[0] : df192 - latest


3rd one will ask you which GPU/GPUs or CPU you want to use for the merging (conversion) process:
Choose one or several GPU idxs (separated by comma).
[CPU] : CPU
[0] : Your GPU
[0] Which GPU indexes to choose? :


Pressing enter will use default value (0).

After that's done you will see a command line window with current settings as well as preview window which shows all the controls needed to operate the interactive converter/merger:


[IMAGE MISSING, WORK IN PROGRESS]

Here is the list of all merger/converter features explained:

Please check the help screen by pressing tab to see which keys correspond to which option in case they change or you are using different layout keyboard, they may also change over time.

Also not that merging AMP may not feature all of the options SAEHD merging has, however once you understand SAEHD merging then AMP is very similar, most options have the same name and work in similar way. I will not be expanding the guide with AMP specific merging info since it's all pretty much the same with few missing or added, the help screen (tab) exists for a reason.

1. Main overlay modes:

- original: displays original frame without swapped face
- overlay: simple overlays learned face over the frame - this is the recommended overlay mode to use as it's most stable and preserves most of the original trained look to faces.
- hist-match: overlays the learned face and tires to match it based on histogram, it has 2 modes: normal and masked that can be switched with Z - normal is recommended.
- seamless: uses opencv poisson seamless clone function to blend new learned face over the head in the original frame
- seamless hist match: combines both hist-match and seamless.
- raw-rgb: overlays raw learned face without any masking

2. Hist match threshold: controls strength of the histogram matching in hist-match and seamless hist-match overlay mode.
Q - increases value
A - decreases value


3. Erode mask: controls the size of a mask.
W - increases mask erosion (smaller mask)
S - decreases mask erosion (bigger mask)


4. Blur mask: blurs/feathers the edge of the mask for smoother transition
E - increases blur
D - decreases blur


5. Motion blur: after entering initial parameters (converter mode, model, GPU/CPU) merger loads all frames and data_dst aligned data, while it's doing it, it calculates motion vectors that are being used to create effect of motion blur which this setting controls, it let's you add it in places where face moves around but high values may blur the face even with small movement. The option only works if one set of faces is present in the "data_dst/aligned" folder - if during cleanup you had some faces with _1 prefixes (even if only faces of one person are present) the effect won't work, same goes if there is a mirror that reflects target persons face, in such case you cannot use motion blur and the only way to add it is to train each set of faces separately.
R - increases motion blur
F - decreases motion blur


6. Super resolution: uses similar algorithm as data_src dataset/faceset enhancer, it can add some more definitions to areas such as teeth, eyes and enhance detail/texture of the learned face.
T - increases the enhancement effect
G - decreases the enhancement effect


7. Blur/sharpen: blurs or sharpens the learned face using box or gaussian method.
Y - sharpens the face
H - blurs the face
N - box/gaussian mode switch


8. Face scale: scales learned face to be larger or smaller.
U - scales learned face down
J - scales learned face up


9. Mask modes: there are 6 masking modes:
dst: uses masks derived from the shape of the landmarks generated during data_dst faceset/dataset extraction.
learned-prd: uses masks learned during training. Keep shape of SRC faces.
learned-dst: uses masks learned during training. Keep shape of DST faces.
learned-prd*dst: combines both masks, smaller size of both.
learned-prd+dst: combines both masks, bigger size of both.
XSeg-prd: uses XSeg model to mask using data from source faces.
XSeg-dst: uses XSeg model to mask using data from destination faces - this mode is one you'll most likely use as it will mask the face according to shape of DST and exclude all obstructions (assuming you did label your DST faces correctly).

XSeg-prd*dst: combines both masks, smaller size of both.
learned-prd*dst*XSeg-dst*prd: combines all 4 mask modes, smaller size of all.

10. Color transfer modes: similar to color transfer during training, you can use this feature to better match skin color of the learned face to the original frame for more seamless and realistic face swap. There are 8 different modes:

RCT - Most often used and recommend.
LCT - 2nd most often use option, stronger effect than RCT.
MKL
MKL-M - Good alternative for RCT, quite similar in some regards.
IDT
IDT-M
SOT-M
MIX-M

11. Image degrade modes:
there are 3 settings that you can use to affect the look of the original frame (without affecting the swapped face):
Denoise - denoises image making it slightly blurry (I - increases effect, K - decrease effect)
Bicubic - blurs the image using bicubic method (O - increases effect, L - decrease effect)
Color - decreases color bit depth (P - increases effect, ; - decrease effect)

AMP Specific options:

Morph Factor: higher value will result in pure predicted results, lowering it will smoothly morph between it and your DST face and at the very end it simply shows DST face.

Additional controls:
TAB button
- switches between main preview window and help screen.
For complete list of keys (and what they control, such as moving forward/backward, starting merging) check the help screen.
Bear in mind these will only work in the main preview window, pressing any button while on the help screen won't do anything.

12. Conversion of frames back into video:

After you merged/convert all the faces and you will have a folder named "merged" inside your "data_dst" folder containing all frames as well as "merged_masked" which contains mask frames.
Last step is to convert them back into video and combine with original audio track from data_dst.mp4 file.

To do so you will use one of 4 provided .bat files that will use FFMPEG to combine all the frames into a video in one of the following formats - avi, mp4, loseless mp4 or loseless mov:

- 8) merged to avi
- 8) merged to mov lossless
- 8) merged to mp4 lossless
- 8) merged to mp4

Alternatively if you want to have more control, further refine masks in some portions of the video, adjust colors of the face or do something else you can manually composite your video by taking audio from data_dst, your original frames, merged frames and mask frames, importing it into a video editing software you know and manually create the final video, this lets you do the things I already mentioned, adjust masks by further blurring or sharpening them (commonly reffered to as mask feathering), slightly enlarge or decrease size of the mask (thus revealing more or less of the DST face underneath it, apply additional color correction and color matching to your face (by using the mask to just display face portion of your merged frame), add sharpening, film grain/noise, etc.

Look up video compositing guides on youtube as it's too complex of a topic to cover it in this guide. Alternatively check out video compositing/editing thread on our forum, link to which you can find in the beginning of the guide (useful links) or by simply visiting this link: https://mrdeepfakes.com/forums/thread-guide-compositing-post-processing-video-editing-guides There is not much there at the moment but what is there covers some basics to help you start out.

And that's it!

If you have more questions that weren't covered in this thread use the search option and check other threads and guides for more info on the process (or use google), alternatively PM me for more info, I offer paid assistance and can teach you everything there is about to DFL, MVE and other related things so you can start making perfect deepfakes in no time.

Current issues/bugs:


Github page for issues reports: https://github.com/iperov/DeepFaceLab/issues
If you can't find the exact issue in existing forum threads, it wasn't mentioned on github and you believe no one else discovered it yet create a new thread here:
https://mrdeepfakes.com/forums/forum-questions

If your issue is common your thread will be deleted without a notice. Use search feature, if you search for errors by copying them directly from command line window remember to only copy key errors parts as directories names will differ between various users. When reporting issues make sure to include your full PC specs (CPU, GPU, RAM amount, OS) as well as your DFL version and model settings, describe what leads to the issues you're experiencing.
 
Last edited:

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
Code:
Official repository: https://github.com/iperov/DeepFaceLab
 
Please consider a donation.
 
 
Windows 10 users important notice!
You should set this setting in order to work correctly.
 
System – Display – Graphics settings
 

 
============ CHANGELOG ============
 
== 20.10.2021 ==
 
SAEHD, AMP: random scale increased to -0.15+0.15. Improved lr_dropout capability to reach lower value of the loss.
 
SAEHD: changed algorithm for bg_style_power. Now can better stitch a face losing src-likeness.
 
added option Random hue/saturation/light intensity applied to the src face set only at the input of the neural network. Stabilizes color perturbations during face swapping. Reduces the quality of the color transfer by selecting the closest one in the src faceset. Thus the src faceset must be diverse enough. Typical fine value is 0.05.
 
Liae arhi: when random_warp is off, inter_AB network is no longer trained to keep the face more src-like.
 
 
== 09.10.2021 ==
 
SAEHD: added -t arhi option. Makes the face more src-like.
                                                        
SAEHD, AMP:
 
removed the implicit function of periodically retraining last 16 “high-loss” samples
 
fixed export to .dfm format to work correctly in DirectX12 DeepFaceLive build.
 
In the sample generator, the random scaling was increased from -0.05+0.05 to -0.125+0.125, which improves the generalization of faces.
 
 
== 06.09.2021 ==
 
Fixed error in model saving.
 
AMP, SAEHD: added option ‘blur out mask’
Blurs nearby area outside of applied face mask of training samples.
The result is the background near the face is smoothed and less noticeable on swapped face.
The exact xseg mask in src and dst faceset is required.
 
AMP, SAEHD: Sample processors count are no more limited to 8, thus if you have AMD processor with 16+ cores, increase paging file size.
 
DirectX12 build: update tensorflow-directml to 1.15.5 version.
 
== 12.08.2021 ==
 
XSeg model: improved pretrain option
 
Generic XSeg: added more faces (the faceset is not publicly available) and retrained with pretrain option. The quality is now higher.
 
Updated RTM WF Dataset with the new Generic XSeg mask applied, also added 490 faces with closed eyes.
 
 
== 30.07.2021 ==
 
Export AMP/SAEHD: added "Export quantized" option. (was enabled before)
Makes the exported model faster. If you have problems, disable this option.
 
 
AMP model:
changed help of ct mode:
       Change color distribution of src samples close to dst samples. If src faceset is deverse enough, then lct mode is fine in most cases.
Default inter dims now 1024
return lr_dropout option
last high loss samples behaviour - same as SAEHD
 
XSeg model: added pretrain option.
 
Generic XSeg: retrained with pretrain option. The quality is now higher.
 
Updated RTM WF Dataset with the new Generic XSeg mask applied.
 
== 17.07.2021 ==
 
SAE/AMP: GAN model is reverted to December version, which is better, tested on high-res fakes.
 
AMP:   default morph factor is now 0.5
       Removed eyes_mouth_prio option, enabled permanently.
       Removed masked training, enabled permanently.
 
Added script
6) train AMP SRC-SRC.bat
 
Stable approach to train AMP:
1)  Get fairly diverse src faceset
2)  Set morph factor to 0.5
3)  train AMP SRC-SRC for 500k+ iters (more is better)
4)  delete inter_dst from model files
5)  train as usual
 

 
== 01.07.2021 ==
 
AMP model:   fixed preview history
            
added ‘Inter dimensions’ option. The model is not changed. Should be equal or more than AutoEncoder dimensions.
More dims are better, but require more VRAM. You can fine-tune model size to fit your GPU.
 
Removed pretrain option.
 
Default morph factor is now 0.1
 
How to train AMP:
 
1)  Train as usual src-dst.
2)  Delete inters model files.
3)  Train src-src. It’s mean place src aligned to data_dst
4)  Delete inters model files.
5)  Train as usual src-dst.
 
Added scripts
6) export AMP as dfm.bat
6) export SAEHD as dfm.bat
Export model as .dfm format to work in DeepFaceLive.
 
== 02.06.2021 ==
 
AMP model: added ‘morph_factor’ option. [0.1 .. 0.5]
The smaller the value, the more src-like facial expressions will appear. 
The larger the value, the less space there is to train a large dst faceset in the neural network. 
Typical fine value is 0.33
 
 
AMP model: added ‘pretrain’ mode as in SAEHD
 
Default pretrain dataset is updated with applied Generic XSeg mask
 
 
== 30.05.2021 ==
 
Added new experimental model ‘AMP’ (as amplifier, because dst facial expressions are amplified to src)

 
 
It has controllable ‘morph factor’, you can specify the value (0.0 .. 1.0) in the console before merging process.
 
If the shapes of the faces are different, you will get different jaw line

which requires a hard post process.
 
But you can pretrain a celeb on large dst faceset with applied Generic XSeg mask (included in torrent). Then continue train with dst of the fake.
In this case you will get more ‘sewed’ face.

 
 
And merged face looks fine:

 
 
Large dst WF faceset with applied Generic XSeg mask is now included in torrent file.
If your src faceset is diverse and large enough, then ‘lct’ color transfer mode should be used during pretraining.
 
 
XSegEditor: delete button now moves the face to _trash directory and it has been moved to the right border of the window
 
Faceset packer now asks whether to delete the original files
 
Trainer now saves every 25 min instead of 15
 
 
 
== 12.05.2021 ==
 
FacesetResizer now supports changing face type
XSegEditor: added delete button
Improved training sample augmentation for XSeg trainer.
XSeg model has been changed to work better with large amount of various faces, thus you should retrain existing xseg model.
Added Generic XSeg model pretrained on various faces. It is most suitable for src faceset because it contains clean faces, but also can be applied on dst footage without complex face obstructions.
5.XSeg Generic) data_dst whole_face mask - apply.bat
5.XSeg Generic) data_src whole_face mask - apply.bat
 
== 22.04.2021 ==
 
Added new build DeepFaceLab_DirectX12, works on all devices that support DirectX12 in Windows 10:
 
AMD Radeon R5/R7/R9 2xx series or newer
Intel HD Graphics 5xx or newer
NVIDIA GeForce GTX 9xx series GPU or newer
DirectX12 is 20-80% slower on NVIDIA Cards comparing to ‘NVIDIA’ build.
 
Improved XSeg sample generator in the training process.
 
 
== 23.03.2021 ==
 
SAEHD: random_flip option is replaced with
 
random_src_flip (default OFF)
 
Random horizontal flip SRC faceset. Covers more angles, but the face maylook less naturally
 
random_dst_flip (default ON)
 
Random horizontal flip DST faceset. Makes generalization of src->dst better, if src random flip is not enabled.
 
 
Added faceset resize tool via
 
4.2) data_src util faceset resize.bat
5.2) data_dst util faceset resize.bat
 
Resize faceset to match model resolution to reduce CPU load during training.
Don’t forget to keep original faceset.
 
 
== 04.01.2021 ==
 
SAEHD: GAN is improved. Now produces less artifacts and more cleaner preview.
 
All GAN options:
 
GAN power
Forces the neural network to learn small details of the face. 
Enable it only when the face is trained enough with lr_dropout(on) and random_warp(off), and don't disable. 
The higher the value, the higher the chances of artifacts. Typical fine value is 0.1
 
GAN patch size (3-640)
The higher patch size, the higher the quality, the more VRAM is required. 
You can get sharper edges even at the lowest setting. 
Typical fine value is resolution / 8.
 
GAN dimensions (4-64)
The dimensions of the GAN network. 
The higher dimensions, the more VRAM is required. 
You can get sharper edges even at the lowest setting. 
Typical fine value is 16.
 
Comparison of different settings:
 

 
 
== 01.01.2021 ==
 
Build for “2080TI and earlier” now exists again.
 
== 22.12.2020 ==
 
The load time of training data has been reduced significantly.
 
== 20.12.2020 ==
 
SAEHD:
 
lr_dropout now can be used with AdaBelief
 
Eyes priority is replaced with Eyes and mouth priority
Helps to fix eye problems during training like "alien eyes" and wrong eyes direction. 
Also makes the detail of the teeth higher.
 
New default values with new model:
Archi : ‘liae-ud’
AdaBelief : enabled
 
== 18.12.2020 ==
 
Now single build for all video cards.
                                
Upgraded to Tensorflow 2.4.0, CUDA 11.2, CuDNN 8.0.5.
You don’t need to install anything.
 
== 11.12.2020 ==
 
Upgrade to Tensorflow 2.4.0rc4
 
Now support RTX 3000 series.
 
Videocards with Compute Capability 3.0 are no longer supported.
 
CPUs without AVX are no longer supported.
 
SAEHD: added new option
Use AdaBelief optimizer?
Experimental AdaBelief optimizer. It requires more VRAM, but the accuracy of the model is higher, and lr_dropout is not needed.
 
 
== 02.08.2020 ==
 
SAEHD: now random_warp is disabled for pretraining mode by default
Merger: fix load time of xseg if it has no model files
 
== 18.07.2020 ==
 
Fixes
 
SAEHD: write_preview_history now works faster
The frequency at which the preview is saved now depends on the resolution.
For example 64x64 – every 10 iters. 448x448 – every 70 iters.
 
Merger: added option “Number of workers?”
Specify the number of threads to process. 
A low value may affect performance. 
A high value may result in memory error. 
The value may not be greater than CPU cores.
 
 
== 17.07.2020 ==
 
SAEHD:
 
Pretrain dataset is replaced with high quality FFHQ dataset.
 
Changed help for “Learning rate dropout” option:
When the face is trained enough, you can enable this option to get extra sharpness and reduce subpixel shake for less amount of iterations. 
Enabled it before “disable random warp” and before GAN. n disabled. y enabled
cpu enabled on CPU. This allows not to use extra VRAM, sacrificing 20% time of iteration.
 
Changed help for GAN option:
Train the network in Generative Adversarial manner. 
Forces the neural network to learn small details of the face. 
Enable it only when the face is trained enough and don't disable. 
Typical value is 0.1
 
improved GAN. Now it produces better skin detail, less patterned aggressive artifacts, works faster.

 
== 04.07.2020 ==
 
Fix bugs.
Renamed some 5.XSeg) scripts.
Changed help for GAN_power.
 
== 27.06.2020 ==
 
Extractor:
       Extraction now can be continued, but you must specify the same options again.
 
       added ‘Max number of faces from image’ option.
If you extract a src faceset that has frames with a large number of faces, 
it is advisable to set max faces to 3 to speed up extraction.
0 - unlimited
 
added ‘Image size’ option.
The higher image size, the worse face-enhancer works.
Use higher than 512 value only if the source image is sharp enough and the face does not need to be enhanced.
 
added ‘Jpeg quality’ option in range 1-100. The higher jpeg quality the larger the output file size
 
 
Sorter: improved sort by blur and by best faces.
 
== 22.06.2020 ==
 
XSegEditor:
changed hotkey for xseg overlay mask
“overlay xseg mask” now works in polygon mode

 
== 21.06.2020 ==
 
SAEHD:
Resolution for –d archi is now automatically adjusted to be divisible by 32.
‘uniform_yaw’ now always enabled in pretrain mode.
 
Subprocessor now writes an error if it does not start.
 
XSegEditor: fixed incorrect count of labeled images.
 
XNViewMP: dark theme is enabled by default

 
== 19.06.2020 ==
 
SAEHD:
 
Maximum resolution is increased to 640.
 
‘hd’ archi is removed. ‘hd’ was experimental archi created to remove subpixel shake, but ‘lr_dropout’ and ‘disable random warping’ do that better.
 
‘uhd’ is renamed to ‘-u’
dfuhd and liaeuhd will be automatically renamed to df-u and liae-u in existing models.
 
Added new experimental archi (key -d) which doubles the resolution using the same computation cost.
It is mean same configs will be x2 faster, or for example you can set 448 resolution and it will train as 224.
Strongly recommended not to train from scratch and use pretrained models.
 
New archi naming:
'df' keeps more identity-preserved face.
'liae' can fix overly different face shapes.
'-u' increased likeness of the face.
'-d' (experimental) doubling the resolution using the same computation cost
Opts can be mixed (-ud)
Examples: df, liae, df-d, df-ud, liae-ud, ...
 
Not the best example of 448 df-ud trained on 11GB:

 
Improved GAN training (GAN_power option).  It was used for dst model, but actually we don’t need it for dst.
Instead, a second src GAN model with x2 smaller patch size was added, so the overall quality for hi-res models should be higher.
 
Added option ‘Uniform yaw distribution of samples (y/n)’:
       Helps to fix blurry side faces due to small amount of them in the faceset.
 
Quick96:
       Now based on df-ud archi and 20% faster.
 
XSeg trainer:
       Improved sample generator.
Now it randomly adds the background from other samples.
Result is reduced chance of random mask noise on the area outside the face.
Now you can specify ‘batch_size’ in range 2-16.
 
Reduced size of samples with applied XSeg mask. Thus size of packed samples with applied xseg mask is also reduced.
 
 
== 11.06.2020 ==
 
Trainer: fixed "Choose image for the preview history". Now you can switch between subpreviews using 'space' key.
Fixed "Write preview history". Now it writes all subpreviews in separated folders
 

also the last preview saved as _last.jpg before the first file

thus you can easily check the changes with the first file in photo viewer
 
 
XSegEditor: added text label of total labeled images
Changed frame line design
Changed loading frame design
 

 
== 08.06.2020 ==
 
SAEHD: resolution >= 256 now has second dssim loss function
 
SAEHD: lr_dropout now can be ‘n’, ‘y’, ‘cpu’. ‘n’ and ’y’ are the same as before.
‘cpu’ mean enabled on CPU. This allows not to use extra VRAM, sacrificing 20% time of iteration.
fix errors
 
reduced chance of the error "The paging file is too small for this operation to complete."
 
updated XNViewMP to 0.96.2
 
== 04.06.2020 ==
 
Manual extractor: now you can specify the face rectangle manually using ‘R Mouse button’.
It is useful for small, blurry, undetectable faces, animal faces.

Warning:
Landmarks cannot be placed on the face precisely, and they are actually used for positioning the red frame.
Therefore, such frames must be used only with XSeg workflow !
Try to keep the red frame the same as the adjacent frames.
 
added script
10.misc) make CPU only.bat
This script will convert your DeepFaceLab folder to work on CPU without any problems. An internet connection is required.
It is useful to train on Colab and merge interactively on your comp without GPU.
 
== 31.05.2020 ==
 
XSegEditor: added button "view XSeg mask overlay face"
 
== 06.05.2020 ==
 
Some fixes
 
SAEHD: changed UHD arhis. You have to retrain uhd models from scratch.
 
== 20.04.2020 ==
 
XSegEditor: fix bug
 
Merger: fix bug
 
== 15.04.2020 ==
 
XSegEditor: added view lock at the center by holding shift in drawing mode.
 
Merger: color transfer “sot-m”: speed optimization for 5-10%
 
Fix minor bug in sample loader
 
== 14.04.2020 ==
 
Merger: optimizations
 
        color transfer ‘sot-m’ : reduced color flickering, but consuming x5 more time to process
 
        added mask mode ‘learned-prd + learned-dst’ – produces largest area of both dst and predicted masks
XSegEditor : polygon is now transparent while editing
 
New example data_dst.mp4 video
 
New official mini tutorial https://www.youtube.com/watch?v=1smpMsfC3ls
 
== 06.04.2020 ==
 
Fixes for 16+ cpu cores and large facesets.
 
added 5.XSeg) data_dst/data_src mask for XSeg trainer - remove.bat
       removes labeled xseg polygons from the extracted frames
      
 
== 05.04.2020 ==
 
Decreased amount of RAM used by Sample Generator.
 
Fixed bug with input dialog in Windows 10
 
Fixed running XSegEditor when directory path contains spaces
 
SAEHD: ‘Face style power’ and ‘Background style power’  are now available for whole_face
 New help messages for these options.
 
XSegEditor: added button ‘view trained XSeg mask’, so you can see which frames should be masked to improve mask quality.
 
Merger:
added ‘raw-predict’ mode. Outputs raw predicted square image from the neural network.
 
mask-mode ‘learned’ replaced with 3 new modes:
       ‘learned-prd’ – smooth learned mask of the predicted face
       ‘learned-dst’ – smooth learned mask of DST face
       ‘learned-prd*learned-dst’ – smallest area of both (default)
            
 
Added new face type : head
Now you can replace the head.
Example: https://www.youtube.com/watch?v=xr5FHd0AdlQ
Requirements:
       Post processing skill in Adobe After Effects or Davinci Resolve.
Usage:
1)  Find suitable dst footage with the monotonous background behind head
2)  Use “extract head” script
3)  Gather rich src headset from only one scene (same color and haircut)
4)  Mask whole head for src and dst using XSeg editor
5)  Train XSeg
6)  Apply trained XSeg mask for src and dst headsets
7)  Train SAEHD using ‘head’ face_type as regular deepfake model with DF archi. You can use pretrained model for head. Minimum recommended resolution for head is 224.
8)  Extract multiple tracks, using Merger:
a.  Raw-rgb
b.  XSeg-prd mask
c.  XSeg-dst mask
9)  Using AAE or DavinciResolve, do:
a.  Hide source head using XSeg-prd mask: content-aware-fill, clone-stamp, background retraction, or other technique
b.  Overlay new head using XSeg-dst mask
 
Warning: Head faceset can be used for whole_face or less types of training only with XSeg masking.
 
 
 
== 30.03.2020 ==
 
New script:
       5.XSeg) data_dst/src mask for XSeg trainer - fetch.bat
Copies faces containing XSeg polygons to aligned_xseg\ dir.
Useful only if you want to collect labeled faces and reuse them in other fakes.
 
Now you can use trained XSeg mask in the SAEHD training process.
It’s mean default ‘full_face’ mask obtained from landmarks will be replaced with the mask obtained from the trained XSeg model.
use
5.XSeg.optional) trained mask for data_dst/data_src - apply.bat
5.XSeg.optional) trained mask for data_dst/data_src - remove.bat
 
Normally you don’t need it. You can use it, if you want to use ‘face_style’ and ‘bg_style’ with obstructions.
 
XSeg trainer : now you can choose type of face
XSeg trainer : now you can restart training in “override settings”
Merger: XSeg-* modes now can be used with all types of faces.
 
Therefore old MaskEditor, FANSEG models, and FAN-x modes have been removed,
because the new XSeg solution is better, simpler and more convenient, which costs only 1 hour of manual masking for regular deepfake.
 
 
== 27.03.2020 ==
 
XSegEditor: fix bugs, changed layout, added current filename label
 
SAEHD: fixed the use of pretrained liae model, now it produces less face morphing
 
== 25.03.2020 ==
 
SAEHD: added 'dfuhd' and 'liaeuhd' archi
uhd version is lighter than 'HD' but heavier than regular version.
liaeuhd provides more "src-like" result
comparison:
       liae:    https://i.imgur.com/JEICFwI.jpg
       liaeuhd: https://i.imgur.com/ymU7t5E.jpg
 
 
added new XSegEditor !
 
here new whole_face + XSeg workflow:
 
with XSeg model you can train your own mask segmentator for dst(and/or src) faces
that will be used by the merger for whole_face.
 
Instead of using a pretrained segmentator model (which does not exist),
you control which part of faces should be masked.
 
new scripts:
       5.XSeg) data_dst edit masks.bat
       5.XSeg) data_src edit masks.bat
       5.XSeg) train.bat
 
Usage:
       unpack dst faceset if packed
 
       run 5.XSeg) data_dst edit masks.bat
 
       Read tooltips on the buttons (en/ru/zn languages are supported)
 
       mask the face using include or exclude polygon mode.
      
       repeat for 50/100 faces,
             !!! you don't need to mask every frame of dst
             only frames where the face is different significantly,
             for example:
                    closed eyes
                    changed head direction
                    changed light
             the more various faces you mask, the more quality you will get
 
             Start masking from the upper left area and follow the clockwise direction.
             Keep the same logic of masking for all frames, for example:
                    the same approximated jaw line of the side faces, where the jaw is not visible
                    the same hair line
             Mask the obstructions using exclude polygon mode.
 
       run XSeg) train.bat
             train the model
 
             Check the faces of 'XSeg dst faces' preview.
 
             if some faces have wrong or glitchy mask, then repeat steps:
                    run edit
                    find these glitchy faces and mask them
                    train further or restart training from scratch
 
Restart training of XSeg model is only possible by deleting all 'model\XSeg_*' files.
 
If you want to get the mask of the predicted face (XSeg-prd mode) in merger,
you should repeat the same steps for src faceset.
 
New mask modes available in merger for whole_face:
 
XSeg-prd       - XSeg mask of predicted face  -> faces from src faceset should be labeled
XSeg-dst       - XSeg mask of dst face               -> faces from dst faceset should be labeled
XSeg-prd*XSeg-dst - the smallest area of both
 
if workspace\model folder contains trained XSeg model, then merger will use it,
otherwise you will get transparent mask by using XSeg-* modes.
 
Some screenshots:
XSegEditor: https://i.imgur.com/7Bk4RRV.jpg
trainer   : https://i.imgur.com/NM1Kn3s.jpg
merger    : https://i.imgur.com/glUzFQ8.jpg
 
example of the fake using 13 segmented dst faces
          : https://i.imgur.com/wmvyizU.gifv
 
 
== 18.03.2020 ==
 
Merger: fixed face jitter
 
== 15.03.2020 ==
 
global fixes
 
SAEHD: removed option learn_mask, it is now enabled by default
 
removed liaech arhi
 
removed support of extracted(aligned) PNG faces. Use old builds to convert from PNG to JPG.
 
 
== 07.03.2020 ==
 
returned back
3.optional) denoise data_dst images.bat
       Apply it if dst video is very sharp.
 
       Denoise dst images before face extraction.
       This technique helps neural network not to learn the noise.
       The result is less pixel shake of the predicted face.
      
 
SAEHD:
 
added new experimental archi
'liaech' - made by @chervonij. Based on liae, but produces more src-like face.
 
lr_dropout is now disabled in pretraining mode.
 
Sorter:
 
added sort by "face rect size in source image"
small faces from source image will be placed at the end
 
added sort by "best faces faster"
same as sort by "best faces"
but faces will be sorted by source-rect-area instead of blur.
 
 
 
== 28.02.2020 ==
 
Extractor:
 
image size for all faces is now 512
 
fix RuntimeWarning during the extraction process
 
SAEHD:
 
max resolution is now 512
 
fix hd arhitectures. Some decoder's weights haven't trained before.
 
new optimized training:
for every <batch_size*16> samples,
model collects <batch_size> samples with the highest error and learns them again
therefore hard samples will be trained more often
 
'models_opt_on_gpu' option is now available for multigpus (before only for 1 gpu)
 
fix 'autobackup_hour'
 
== 23.02.2020 ==
 
SAEHD: pretrain option is now available for whole_face type
 
fix sort by abs difference
fix sort by yaw/pitch/best for whole_face's
 
== 21.02.2020 ==
 
Trainer: decreased time of initialization
 
Merger: fixed some color flickering in overlay+rct mode
 
SAEHD:
 
added option Eyes priority (y/n)
 
       Helps to fix eye problems during training like "alien eyes"
       and wrong eyes direction ( especially on HD architectures )
       by forcing the neural network to train eyes with higher priority.
       before/after https://i.imgur.com/YQHOuSR.jpg
 
added experimental face type 'whole_face'
 
       Basic usage instruction: https://i.imgur.com/w7LkId2.jpg
      
       'whole_face' requires skill in Adobe After Effects.
 
       For using whole_face you have to extract whole_face's by using
       4) data_src extract whole_face
       and
       5) data_dst extract whole_face
       Images will be extracted in 512 resolution, so they can be used for regular full_face's and half_face's.
      
       'whole_face' covers whole area of face include forehead in training square,
       but training mask is still 'full_face'
       therefore it requires manual final masking and composing in Adobe After Effects.
 
added option 'masked_training'
       This option is available only for 'whole_face' type.
       Default is ON.
       Masked training clips training area to full_face mask,
       thus network will train the faces properly. 
       When the face is trained enough, disable this option to train all area of the frame.
       Merge with 'raw-rgb' mode, then use Adobe After Effects to manually mask, tune color, and compose whole face include forehead.
 
 
 
== 03.02.2020 ==
 
"Enable autobackup" option is replaced by
"Autobackup every N hour" 0..24 (default 0 disabled), Autobackup model files with preview every N hour
 
Merger:
 
'show alpha mask' now on 'V' button
 
'super resolution mode' is replaced by
'super resolution power' (0..100) which can be modified via 'T' 'G' buttons
 
default erode/blur values are 0.
 
new multiple faces detection log: https://i.imgur.com/0XObjsB.jpg
 
now uses all available CPU cores ( before max 6 )
so the more processors, the faster the process will be.
 
== 01.02.2020 ==
 
Merger:
 
increased speed
 
improved quality
 
SAEHD: default archi is now 'df'
 
== 30.01.2020 ==
 
removed use_float16 option
 
fix MultiGPU training
 
== 29.01.2020 ==
 
MultiGPU training:
fixed CUDNN_STREAM errors.
speed is significantly increased.
 
Trainer: added key 'b' : creates a backup even if the autobackup is disabled.
 
== 28.01.2020 ==
 
optimized face sample generator, CPU load is significantly reduced
 
fix of update preview for history after disabling the pretrain mode
 
 
SAEHD:
 
added new option
GAN power 0.0 .. 10.0
       Train the network in Generative Adversarial manner.
       Forces the neural network to learn small details of the face.
       You can enable/disable this option at any time,
       but better to enable it when the network is trained enough.
       Typical value is 1.0
       GAN power with pretrain mode will not work.
 
Example of enabling GAN on 81k iters +5k iters
https://i.imgur.com/OdXHLhU.jpg
https://i.imgur.com/CYAJmJx.jpg
 
 
dfhd: default Decoder dimensions are now 48
the preview for 256 res is now correctly displayed
 
fixed model naming/renaming/removing
 
 
Improvements for those involved in post-processing in AfterEffects:
 
Codec is reverted back to x264 in order to properly use in AfterEffects and video players.
 
Merger now always outputs the mask to workspace\data_dst\merged_mask
 
removed raw modes except raw-rgb
raw-rgb mode now outputs selected face mask_mode (before square mask)
 
'export alpha mask' button is replaced by 'show alpha mask'.
You can view the alpha mask without recompute the frames.
 
8) 'merged *.bat' now also output 'result_mask.' video file.
8) 'merged lossless' now uses x264 lossless codec (before PNG codec)
result_mask video file is always lossless.
 
Thus you can use result_mask video file as mask layer in the AfterEffects.
 
 
== 25.01.2020 ==
 
Upgraded to TF version 1.13.2
 
Removed the wait at first launch for most graphics cards.
 
Increased speed of training by 10-20%, but you have to retrain all models from scratch.
 
SAEHD:
 
added option 'use float16'
       Experimental option. Reduces the model size by half.
       Increases the speed of training.
       Decreases the accuracy of the model.
       The model may collapse or not train.
       Model may not learn the mask in large resolutions.
       You enable/disable this option at any time.
 
true_face_training option is replaced by
"True face power". 0.0000 .. 1.0
Experimental option. Discriminates the result face to be more like the src face. Higher value - stronger discrimination.
Comparison - https://i.imgur.com/czScS9q.png
 
== 23.01.2020 ==
 
SAEHD: fixed clipgrad option
 
== 22.01.2020 == BREAKING CHANGES !!!
 
Getting rid of the weakest link - AMD cards support.
All neural network codebase transferred to pure low-level TensorFlow backend, therefore
removed AMD/Intel cards support, now DFL works only on NVIDIA cards or CPU.
 
old DFL marked as 1.0 still available for download, but it will no longer be supported.
 
global code refactoring, fixes and optimizations
 
Extractor:
 
now you can choose on which GPUs (or CPU) to process
 
improved stability for < 4GB GPUs
 
increased speed of multi gpu initializing
 
now works in one pass (except manual mode)
so you won't lose the processed data if something goes wrong before the old 3rd pass
 
Faceset enhancer:
 
now you can choose on which GPUs (or CPU) to process
 
Trainer:
 
now you can choose on which GPUs (or CPU) to train the model.
Multi-gpu training is now supported.
Select identical cards, otherwise fast GPU will wait slow GPU every iteration.
 
now remembers the previous option input as default with the current workspace/model/ folder.
 
the number of sample generators now matches the available number of processors
 
saved models now have names instead of GPU indexes.
Therefore you can switch GPUs for every saved model.
Trainer offers to choose latest saved model by default.
You can rename or delete any model using the dialog.
 
models now save the optimizer weights in the model folder to continue training properly
 
removed all models except SAEHD, Quick96
 
trained model files from DFL 1.0 cannot be reused
 
AVATAR model is also removed.
How to create AVATAR like in this video? https://www.youtube.com/watch?v=4GdWD0yxvqw
1) capture yourself with your own speech repeating same head direction as celeb in target video
2) train regular deepfake model with celeb faces from target video as src, and your face as dst
3) merge celeb face onto your face with raw-predict mode
4) compose masked mouth with target video in AfterEffects
 
 
SAEHD:
 
now has 3 options: Encoder dimensions, Decoder dimensions, Decoder mask dimensions
 
now has 4 arhis: dfhd (default), liaehd, df, liae
df and liae are from SAE model, but use features from SAEHD model (such as combined loss and disable random warp)
 
dfhd/liaehd - changed encoder/decoder architectures
 
decoder model is combined with mask decoder model
mask training is combined with face training,
result is reduced time per iteration and decreased vram usage by optimizer
 
"Initialize CA weights" now works faster and integrated to "Initialize models" progress bar
 
removed optimizer_mode option
 
added option 'Place models and optimizer on GPU?'
  When you train on one GPU, by default model and optimizer weights are placed on GPU to accelerate the process.
  You can place they on CPU to free up extra VRAM, thus you can set larger model parameters.
  This option is unavailable in MultiGPU mode.
 
pretraining now does not use rgb channel shuffling
pretraining now can be continued
when pre-training is disabled:
1) iters and loss history are reset to 1
2) in df/dfhd archis, only the inter part of the encoder is reset (before encoder+inter)
   thus the fake will train faster with a pretrained df model
 
Merger ( renamed from Converter ):
 
now you can choose on which GPUs (or CPU) to process
 
new hot key combinations to navigate and override frame's configs
 
super resolution upscaler "RankSRGAN" is replaced by "FaceEnhancer"
 
FAN-x mask mode now works on GPU while merging (before on CPU),
therefore all models (Main face model + FAN-x + FaceEnhancer)
now work on GPU while merging, and work properly even on 2GB GPU.
 
Quick96:
 
now automatically uses pretrained model
 
Sorter:
 
removed all sort by *.bat files except one sort.bat
now you have to choose sort method in the dialog
 
Other:
 
all console dialogs are now more convenient
 
XnViewMP is updated to 0.94.1 version
 
ffmpeg is updated to 4.2.1 version
 
ffmpeg: video codec is changed to x265
 
_internal/vscode.bat starts VSCode IDE where you can view and edit DeepFaceLab source code.
 
removed russian/english manual. Read community manuals and tutorials here
https://mrdeepfakes.com/forums/forum-guides-and-tutorials
 
new github page design
 
== 11.01.2020 ==
 
fix freeze on sample loading
 
== 08.01.2020 ==
 
fixes and optimizations in sample generators
 
fixed Quick96 and removed lr_dropout from SAEHD for OpenCL build.
 
CUDA build now works on lower-end GPU with 2GB VRAM:
GTX 880M GTX 870M GTX 860M GTX 780M GTX 770M
GTX 765M GTX 760M GTX 680MX GTX 680M GTX 675MX GTX 670MX
GTX 660M GT 755M GT 750M GT 650M GT 745M GT 645M GT 740M
GT 730M GT 640M GT 735M GT 730M GTX 770 GTX 760 GTX 750 Ti
GTX 750 GTX 690 GTX 680 GTX 670 GTX 660 Ti GTX 660 GTX 650 Ti GTX 650 GT 740
 
== 29.12.2019 ==
 
fix faceset enhancer for faces that contain edited mask
 
fix long load when using various gpus in the same DFL folder
 
fix extract unaligned faces
 
avatar: avatar_type is now only head by default
 
== 28.12.2019 ==
 
FacesetEnhancer now asks to merge aligned_enhanced/ to aligned/
 
fix 0 faces detected in manual extractor
 
Quick96, SAEHD: optimized architecture. You have to restart training.
 
Now there are only two builds: CUDA (based on 9.2) and Opencl.
 
== 26.12.2019 ==
 
fixed mask editor
 
added FacesetEnhancer
4.2.other) data_src util faceset enhance best GPU.bat
4.2.other) data_src util faceset enhance multi GPU.bat
 
FacesetEnhancer greatly increases details in your source face set,
same as Gigapixel enhancer, but in fully automatic mode.
In OpenCL build works on CPU only.
 
before/after https://i.imgur.com/TAMoVs6.png
 
== 23.12.2019 ==
 
Extractor: 2nd pass now faster on frames where faces are not found
 
all models: removed options 'src_scale_mod', and 'sort samples by yaw as target'
If you want, you can manually remove unnecessary angles from src faceset after sort by yaw.
 
Optimized sample generators (CPU workers). Now they consume less amount of RAM and work faster.
 
added
4.2.other) data_src/dst util faceset pack.bat
       Packs /aligned/ samples into one /aligned/samples.pak file.
       After that, all faces will be deleted.
 
4.2.other) data_src/dst util faceset unpack.bat
       unpacks faces from /aligned/samples.pak to /aligned/ dir.
       After that, samples.pak will be deleted.
 
Packed faceset load and work faster.
 
 
== 20.12.2019 ==
 
fix 3rd pass of extractor for some systems
 
More stable and precise version of the face transformation matrix
 
SAEHD: lr_dropout now as an option, and disabled by default
When the face is trained enough, you can enable this option to get extra sharpness for less amount of iterations
 
 
added
4.2.other) data_src util faceset metadata save.bat
       saves metadata of data_src\aligned\ faces into data_src\aligned\meta.dat
 
4.2.other) data_src util faceset metadata restore.bat
       restore metadata from 'meta.dat' to images
       if image size different from original, then it will be automatically resized
 
You can greatly enhance face details of src faceset by using Topaz Gigapixel software.
example before/after https://i.imgur.com/Gwee99L.jpg
Download it from torrent https://rutracker.org/forum/viewtopic.php?t=5757118
Example of workflow:
 
1) run 'data_src util faceset metadata save.bat'
2) launch Topaz Gigapixel
3) open 'data_src\aligned\' and select all images
4) set output folder to 'data_src\aligned_topaz' (create folder in save dialog)
5) set settings as on screenshot https://i.imgur.com/kAVWMQG.jpg
       you can choose 2x, 4x, or 6x upscale rate
6) start process images and wait full process
7) rename folders:
       data_src\aligned        ->  data_src\aligned_original
       data_src\aligned_topaz  ->  data_src\aligned
8) copy 'data_src\aligned_original\meta.dat' to 'data_src\aligned\'
9) run 'data_src util faceset metadata restore.bat'
       images will be downscaled back to original size (256x256) preserving details
       metadata will be restored
10) now your new enhanced faceset is ready to use !
 
 
 
 
 
== 15.12.2019 ==
 
SAEHD,Quick96:
improved model generalization, overall accuracy and sharpness
by using new 'Learning rate dropout' technique from the paper https://arxiv.org/abs/1912.00144
An example of a loss histogram where this function is enabled after the red arrow:
https://i.imgur.com/3olskOd.jpg
 
 
== 12.12.2019 ==
 
removed FacesetRelighter due to low quality of the result
 
added sort by absdiff
This is sort method by absolute per pixel difference between all faces.
options:
Sort by similar? ( y/n ?:help skip:y ) :
if you choose 'n', then most dissimilar faces will be placed first.
 
'sort by final' renamed to 'sort by best'
 
OpenCL: fix extractor for some amd cards
 
== 14.11.2019 ==
 
Converter: added new color transfer mode: mix-m
 
== 13.11.2019 ==
 
SAE,SAEHD,Converter:
added sot-m color transfer
 
Converter:
removed seamless2 mode
 
FacesetRelighter:
Added intensity parameter to the manual picker.
'One random direction' and 'predefined 7 directions' use random intensity from 0.3 to 0.6.
 
== 12.11.2019 ==
 
FacesetRelighter fixes and improvements:
 
now you have 3 ways:
1) define light directions manually (not for google colab)
   watch demo https://youtu.be/79xz7yEO5Jw
2) relight faceset with one random direction
3) relight faceset with predefined 7 directions
 
== 11.11.2019 ==
 
added FacesetRelighter:
Synthesize new faces from existing ones by relighting them using DeepPortraitRelighter network.
With the relighted faces neural network will better reproduce face shadows.
 
Therefore you can synthsize shadowed faces from fully lit faceset.
https://i.imgur.com/wxcmQoi.jpg
 
as a result, better fakes on dark faces:
https://i.imgur.com/5xXIbz5.jpg
 
operate via
data_x add relighted faces.bat
data_x delete relighted faces.bat
 
in OpenCL build Relighter runs on CPU
 
== 09.11.2019 ==
 
extractor: removed "increased speed of S3FD" for compatibility reasons
 
converter:
fixed crashes
removed useless 'ebs' color transfer
changed keys for color degrade
 
added image degrade via denoise - same as denoise extracted data_dst.bat ,
but you can control this option directly in the interactive converter
 
added image degrade via bicubic downscale/upscale
 
SAEHD:
default ae_dims for df now 256. It is safe to train SAEHD on 256 ae_dims and higher resolution.
Example of recent fake: https://youtu.be/_lxOGLj-MC8
 
added Quick96 model.
This is the fastest model for low-end 2GB+ NVidia and 4GB+ AMD cards.
Model has zero options and trains a 96pix fullface.
It is good for quick deepfake demo.
Example of the preview trained in 15 minutes on RTX2080Ti:
https://i.imgur.com/oRMvZFP.jpg
 
== 27.10.2019 ==
 
Extractor: fix for AMD cards
 
== 26.10.2019 ==
 
red square of face alignment now contains the arrow that shows the up direction of an image
 
fix alignment of side faces
Before https://i.imgur.com/pEoZ6Mu.mp4
after https://i.imgur.com/wO2Guo7.mp4
 
fix message when no training data provided
 
== 23.10.2019 ==
 
enhanced sort by final: now faces are evenly distributed not only in the direction of yaw,
but also in pitch
 
added 'sort by vggface': sorting by face similarity using VGGFace model.
Requires 4GB+ VRAM and internet connection for the first run.
 
 
== 19.10.2019 ==
 
fix extractor bug for 11GB+ cards
 
== 15.10.2019 ==
 
removed fix "fixed bug when the same face could be detected twice"
 
SAE/SAEHD:
removed option 'apply random ct'
 
added option
   Color transfer mode apply to src faceset. ( none/rct/lct/mkl/idt, ?:help skip: none )
   Change color distribution of src samples close to dst samples. Try all modes to find the best.
before was lct mode, but sometime it does not work properly for some facesets.
 
 
== 14.10.2019 ==
 
fixed bug when the same face could be detected twice
 
Extractor now produces a less shaked face. but second pass is now slower by 25%
before/after: https://imgur.com/L77puLH
 
SAE, SAEHD: 'random flip' and 'learn mask' options now can be overridden.
It is recommended to start training for first 20k iters always with 'learn_mask'
 
SAEHD: added option Enable random warp of samples, default is on
Random warp is required to generalize facial expressions of both faces.
When the face is trained enough, you can disable it to get extra sharpness for less amount of iterations.
 
== 10.10.2019 ==
 
fixed wrong NVIDIA GPU detection in extraction and training processes
 
increased speed of S3FD 1st pass extraction for GPU with >= 11GB vram.
 
== 09.10.2019 ==
 
fixed wrong NVIDIA GPU indexes in a systems with two or more GPU
fixed wrong NVIDIA GPU detection on the laptops
 
removed TrueFace model.
 
added SAEHD model ( High Definition Styled AutoEncoder )
Compare with SAE: https://i.imgur.com/3QJAHj7.jpg
This is a new heavyweight model for high-end cards to achieve maximum possible deepfake quality in 2020.
 
Differences from SAE:
+ new encoder produces more stable face and less scale jitter
+ new decoder produces subpixel clear result
+ pixel loss and dssim loss are merged together to achieve both training speed and pixel trueness
+ by default networks will be initialized with CA weights, but only after first successful iteration
  therefore you can test network size and batch size before weights initialization process
+ new neural network optimizer consumes less VRAM than before
+ added option <Enable 'true face' training>
  The result face will be more like src and will get extra sharpness.
  Enable it for last 30k iterations before conversion.
+ encoder and decoder dims are merged to one parameter encoder/decoder dims
+ added mid-full face, which covers 30% more area than half face. 
 
example of the preview trained on RTX2080TI, 128 resolution, 512-21 dims, 8 batch size, 700ms per iteration:
without trueface            : https://i.imgur.com/MPPKWil.jpg
with trueface    +23k iters : https://i.imgur.com/dV5Ofo9.jpg
 
== 24.09.2019 ==
 
fix TrueFace model, required retraining
 
== 21.09.2019 ==
 
fix avatar model
 
== 19.09.2019 ==
 
SAE : WARNING, RETRAIN IS REQUIRED !
fixed model sizes from previous update.
avoided bug in ML framework(keras) that forces to train the model on random noise.
 
Converter: added blur on the same keys as sharpness
 
Added new model 'TrueFace'. Only for NVIDIA cards.
This is a GAN model ported from https://github.com/NVlabs/FUNIT
Model produces near zero morphing and high detail face.
Model has higher failure rate than other models.
It does not learn the mask, so fan-x mask modes should be used in the converter.
Keep src and dst faceset in same lighting conditions.
 
== 13.09.2019 ==
 
Converter: added new color transfer modes: mkl, mkl-m, idt, idt-m
 
SAE: removed multiscale decoder, because it's not effective
 
== 07.09.2019 ==
 
Extractor: fixed bug with grayscale images.
 
Converter:
 
Session is now saved to the model folder.
 
blur and erode ranges are increased to -400+400
 
hist-match-bw is now replaced with seamless2 mode.
 
Added 'ebs' color transfer mode (works only on Windows).
 
FANSEG model (used in FAN-x mask modes) is retrained with new model configuration
and now produces better precision and less jitter
 
== 30.08.2019 ==
 
interactive converter now saves the session.
if input frames are changed (amount or filenames)
then interactive converter automatically starts a new session.
if model is more trained then all frames will be recomputed again with their saved configs.
 
== 28.08.2019 ==
 
removed landmarks of lips which are used in face aligning
result is less scale jittering
before  https://i.imgur.com/gJaW5Y4.gifv  
after   https://i.imgur.com/Vq7gvhY.gifv
 
converter: fixed merged\ filenames, now they are 100% same as input from data_dst\
 
converted to X.bat : now properly eats any filenames from merged\ dir as input
 
== 27.08.2019 ==
 
fixed converter navigation logic and output filenames in merge folder
 
added EbSynth program. It is located in _internal\EbSynth\ folder
Start it via 10) EbSynth.bat
It starts with sample project loaded from _internal\EbSynth\SampleProject
EbSynth is mainly used to create painted video, but with EbSynth you can fix some weird frames produced by deepfake process.
before: https://i.imgur.com/9xnLAL4.gifv  
after:  https://i.imgur.com/f0Lbiwf.gifv
official tutorial for EbSynth : https://www.youtube.com/watch?v=0RLtHuu5jV4
 
== 26.08.2019 ==
 
updated pdf manuals for AVATAR model.
 
Avatar converter: added super resolution option.
 
All converters:
fixes and optimizations
super resolution DCSCN network is now replaced by RankSRGAN
added new option sharpen_mode and sharpen_amount
 
== 25.08.2019 ==
 
Converter: FAN-dst mask mode now works for half face models.
 
AVATAR Model: default avatar_type option on first startup is now HEAD.
Head produces much more stable result than source.
 
updated usage of AVATAR model:
Usage:
1) place data_src.mp4 10-20min square resolution video of news reporter sitting at the table with static background,
   other faces should not appear in frames.
2) process "extract images from video data_src.bat" with FULL fps
3) place data_dst.mp4 square resolution video of face who will control the src face
4) process "extract images from video data_dst FULL FPS.bat"
5) process "data_src mark faces S3FD best GPU.bat"
6) process "data_dst extract unaligned faces S3FD best GPU.bat"
7) train AVATAR.bat stage 1, tune batch size to maximum for your card (32 for 6GB), train to 50k+ iters.
8) train AVATAR.bat stage 2, tune batch size to maximum for your card (4 for 6GB), train to decent sharpness.
9) convert AVATAR.bat
10) converted to mp4.bat
 
== 24.08.2019 ==
 
Added interactive converter.
With interactive converter you can change any parameter of any frame and see the result in real time.
 
Converter: added motion_blur_power param.
Motion blur is applied by precomputed motion vectors.
So the moving face will look more realistic.
 
RecycleGAN model is removed.
 
Added experimental AVATAR model. Minimum required VRAM is 6GB for NVIDIA and 12GB for AMD.
 
 
== 16.08.2019 ==
 
fixed error "Failed to get convolution algorithm" on some systems
fixed error "dll load failed" on some systems
 
model summary is now better formatted
 
Expanded eyebrows line of face masks. It does not affect mask of FAN-x converter mode.
ConverterMasked: added mask gradient of bottom area, same as side gradient
 
== 23.07.2019 ==
 
OpenCL : update versions of internal libraries
 
== 20.06.2019 ==
 
Trainer: added option for all models
Enable autobackup? (y/n ?:help skip:%s) :
Autobackup model files with preview every hour for last 15 hours. Latest backup located in model/<>_autobackups/01
 
SAE: added option only for CUDA builds:
Enable gradient clipping? (y/n, ?:help skip:%s) :
Gradient clipping reduces chance of model collapse, sacrificing speed of training.
 
== 02.06.2019 ==
 
fix error on typing uppercase values
 
== 24.05.2019 ==
 
OpenCL : fix FAN-x converter
 
== 20.05.2019 ==
 
OpenCL : fixed bug when analysing ops was repeated after each save of the model
 
== 10.05.2019 ==
 
fixed work of model pretraining
 
== 08.05.2019 ==
 
SAE: added new option
Apply random color transfer to src faceset? (y/n, ?:help skip:%s) :
Increase variativity of src samples by apply LCT color transfer from random dst samples.
It is like 'face_style' learning, but more precise color transfer and without risk of model collapse,
also it does not require additional GPU resources, but the training time may be longer, due to the src faceset is becoming more diverse.
 
== 05.05.2019 ==
 
OpenCL: SAE model now works properly
 
== 05.03.2019 ==
 
fixes
 
SAE: additional info in help for options:
 
Use pixel loss - Enabling this option too early increases the chance of model collapse.
Face style power - Enabling this option increases the chance of model collapse.
Background style power - Enabling this option increases the chance of model collapse.
 
 
== 05.01.2019 ==
 
SAE: added option 'Pretrain the model?'
 
Pretrain the model with large amount of various faces.
This technique may help to train the fake with overly different face shapes and light conditions of src/dst data.
Face will be look more like a morphed. To reduce the morph effect,
some model files will be initialized but not be updated after pretrain: LIAE: inter_AB.h5 DF: encoder.h5.
The longer you pretrain the model the more morphed face will look. After that, save and run the training again.
 
 
== 04.28.2019 ==
 
fix 3rd pass extractor hang on AMD 8+ core processors
 
Converter: fixed error with degrade color after applying 'lct' color transfer
 
added option at first run for all models: Choose image for the preview history? (y/n skip:n)
Controls: [p] - next, [enter] - confirm.
 
fixed error with option sort by yaw. Remember, do not use sort by yaw if the dst face has hair that covers the jaw.
 
== 04.24.2019 ==
 
SAE: finally the collapses were fixed
 
added option 'Use CA weights? (y/n, ?:help skip: %s ) :
Initialize network with 'Convolution Aware' weights from paper https://arxiv.org/abs/1702.06295.
This may help to achieve a higher accuracy model, but consumes a time at first run.
 
== 04.23.2019 ==
 
SAE: training should be restarted
remove option 'Remove gray border' because it makes the model very resource intensive.
 
== 04.21.2019 ==
 
SAE:
fix multiscale decoder.
training with liae archi should be restarted
 
changed help for 'sort by yaw' option:
NN will not learn src face directions that don't match dst face directions. Do not use if the dst face has hair that covers the jaw.
 
 
== 04.20.2019 ==
 
fixed work with NVIDIA cards in TCC mode
 
Converter: improved FAN-x masking mode.
Now it excludes face obstructions such as hair, fingers, glasses, microphones, etc.
example https://i.imgur.com/x4qroPp.gifv
It works only for full face models, because there were glitches in half face version.
 
Fanseg is trained by using manually refined by MaskEditor >3000 various faces with obstructions.
Accuracy of fanseg to handle complex obstructions can be improved by adding more samples to dataset, but I have no time for that :(
Dataset is located in the official mega.nz folder.
If your fake has some complex obstructions that incorrectly recognized by fanseg,
you can add manually masked samples from your fake to the dataset
and retrain it by using --model DEV_FANSEG argument in bat file. Read more info in dataset archive.
Minimum recommended VRAM is 6GB and batch size 24 to train fanseg.
Result model\FANSeg_256_full_face.h5 should be placed to DeepFacelab\facelib\ folder
 
Google Colab now works on Tesla T4 16GB.
With Google Colaboratory you can freely train your model for 12 hours per session, then reset session and continue with last save.
more info how to work with Colab: https://github.com/chervonij/DFL-Colab
 
== 04.07.2019 ==
 
Extractor: added warning if aligned folder contains files that will be deleted.
 
Converter subprocesses limited to maximum 6
 
== 04.06.2019 ==
 
added experimental mask editor.
It is created to improve FANSeg model, but you can try to use it in fakes.
But remember: it does not guarantee quality improvement.
usage:
run 5.4) data_dst mask editor.bat
edit the mask of dst faces with obstructions
train SAE either with 'learn mask' or with 'style values'
Screenshot of mask editor: https://i.imgur.com/SaVpxVn.jpg
result of training and merging using edited mask: https://i.imgur.com/QJi9Myd.jpg
Complex masks are harder to train.
 
SAE:
previous SAE model will not work with this update.
Greatly decreased chance of model collapse.
Increased model accuracy.
Residual blocks now default and this option has been removed.
Improved 'learn mask'.
Added masked preview (switch by space key)
 
Converter:
fixed rct/lct in seamless mode
added mask mode (6) learned*FAN-prd*FAN-dst
 
changed help message for pixel loss:
Pixel loss may help to enhance fine details and stabilize face color. Use it only if quality does not improve over time.
 
fixed ctrl-c exit in no-preview mode
 
== 03.31.2019 ==
 
Converter: fix blur region of seamless.
 
== 03.30.2019 ==
 
fixed seamless face jitter
removed options Suppress seamless jitter, seamless erode mask modifier.
seamlessed face now properly uses blur modifier
added option 'FAN-prd&dst' - using multiplied FAN prd and dst mask,
 
== 03.29.2019 ==
 
Converter: refactorings and optimizations
added new option
Apply super resolution? (y/n skip:n) : Enhance details by applying DCSCN network.
before/after gif - https://i.imgur.com/jJA71Vy.gif
 
== 03.26.2019 ==
 
SAE: removed lightweight encoder.
optimizer mode now can be overriden each run
 
Trainer: the loss line now shows the average loss values after saving
 
Converter: fixed bug with copying files without faces.
 
XNViewMP : updated version
 
fixed cut video.bat for paths with spaces
 
== 03.24.2019 ==
 
old SAE model will not work with this update.
 
Fixed bug when SAE can be collapsed during a time.
 
SAE: removed CA weights and encoder/decoder dims.
 
added new options:
 
Encoder dims per channel (21-85 ?:help skip:%d)
More encoder dims help to recognize more facial features, but require more VRAM. You can fine-tune model size to fit your GPU.
 
Decoder dims per channel (11-85 ?:help skip:%d)
More decoder dims help to get better details, but require more VRAM. You can fine-tune model size to fit your GPU.
 
Add residual blocks to decoder? (y/n, ?:help skip:n) :
These blocks help to get better details, but require more computing time.
 
Remove gray border? (y/n, ?:help skip:n) :
Removes gray border of predicted face, but requires more computing resources.
 
 
Extract images from video: added option
Output image format? ( jpg png ?:help skip:png ) :
PNG is lossless, but produces images with size x10 larger than JPG.
JPG extraction is faster, especially on HDD instead of SSD.
 
== 03.21.2019 ==
 
OpenCL build: fixed, now works on most video cards again.
 
old SAE model will not work with this update.
Fixed bug when SAE can be collapsed during a time
 
Added option
Use CA weights? (y/n, ?:help skip: n ) :
Initialize network with 'Convolution Aware' weights.
This may help to achieve a higher accuracy model, but consumes time at first run.
 
Extractor:
removed DLIB extractor
greatly increased accuracy of landmarks extraction, especially with S3FD detector, but speed of 2nd pass now slower.
From this point on, it is recommended to use only the S3FD detector.
before https://i.imgur.com/SPGeJCm.gif
after https://i.imgur.com/VmmAm8p.gif
 
Converter: added new option to choose type of mask for full-face models.
 
Mask mode: (1) learned, (2) dst, (3) FAN-prd, (4) FAN-dst (?) help. Default - 1 :
Learned – Learned mask, if you choose option 'Learn mask' in model. The contours are fairly smooth, but can be wobbly.
Dst – raw mask from dst face, wobbly contours.
FAN-prd – mask from pretrained FAN model from predicted face. Very smooth not shaky countours.
FAN-dst – mask from pretrained FAN model from dst face. Very smooth not shaky countours.
Advantages of FAN mask: you can get a not wobbly shaky without learning it by model.
Disadvantage of FAN mask: may produce artifacts on the contours if the face is obstructed.
 
== 03.13.2019 ==
 
SAE: added new option
 
Optimizer mode? ( 1,2,3 ?:help skip:1) :
this option only for NVIDIA cards. Optimizer mode of neural network.
1 - default.
2 - allows you to train x2 bigger network, uses a lot of RAM.
3 - allows you to train x3 bigger network, uses huge amount of RAM and 30% slower.
 
Epoch term renamed to iteration term.
 
added showing timestamp in string of training in console
 
== 03.11.2019 ==
 
CUDA10.1AVX users - update your video drivers from geforce.com site
 
face extractor:
 
added new extractor S3FD - more precise, produces less false-positive faces, accelerated by AMD/IntelHD GPU (while MT is not)
 
speed of 1st pass with DLIB significantly increased
 
decreased amount of false-positive faces for all extractors
 
manual extractor: added 'h' button to hide the help information
 
fix DFL conflict with system python installation
 
removed unwanted tensorflow info from console log
 
updated manual_ru
 
== 03.07.2019 ==
 
fixes
 
upgrade to python 3.6.8
 
Reorganized structure of DFL folder. Removed unnecessary files and other trash.
 
Current available builds now:
 
DeepFaceLabCUDA9.2SSE - for NVIDIA cards up to GTX10x0 series and any 64-bit CPU
DeepFaceLabCUDA10.1AVX - for NVIDIA cards up to RTX and CPU with AVX instructions support
DeepFaceLabOpenCLSSE - for AMD/IntelHD cards and any 64-bit CPU
 
== 03.04.2019 ==
 
added
4.2.other) data_src util recover original filename.bat
5.3.other) data_dst util recover original filename.bat
 
== 03.03.2019 ==
 
Convertor: fix seamless
 
== for older changelog see github page ==
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
- reserved for future use -
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
- reserved for future use -
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
- reserved for future use -
 

iperov

DF Enthusiast
Developer
My advices, translated using deepl.com​

SAEHD model options.​

Random_flip​
Turn the image from left to right by random rotation. Allows for better generalization of faces. Slows down training slightly until a clear face is achieved. If both src and dst face sets are quite diverse, this option is not useful. You can turn it off after a workout.​

Batch_size​
Improves facial generalization, especially useful at an early stage. But it increases the time until a clear face is achieved. Increases memory usage. In terms of quality of the final fairy, the higher the value, the better. It's not worth putting it below 4.​

Resolution.​
At first glance, the more the better. However, if the face in the frame is small, there is no point in choosing a large resolution. By increasing the resolution, the training time increases. For face_type=wf, more resolution is required, because the coverage of the face is larger, thus the details of the face are reduced. For wf it makes no sense to choose less than 224.​

Face_type.​
Face coverage in training. The more facial area is covered, the more plausible the result will be.​
The whole_face allows covering the area below the chin and forehead. However, there is no automatic removal of the mask with the forehead, so XSeg is required for the merge, either in Davinci Resolve or Adobe After Effects.​

Archi.​
Liae makes more morph under dst face, but src face in it will still be recognized.​
Df allows you to make the most believable face, but requires more manual work to collect a good variety of src facets and a final color matching.​
The effectiveness of hd architectures has not been proven at this time. The Hd architectures were designed to better smooth the subpixel transition of the face at micro displacements, but the micro shake is also eliminated at df, see below.​

Ae_dims.​
Dimensions of the main brain of the network, which is responsible for generating facial expressions created in the encoder and for supplying a variety of code to the decoder. ​

E_dims.​
The dimensions of the encoder network that are responsible for face detection and further recognition. When these dimensions are not enough, and the facial chips are too diverse, then we have to sacrifice non-standard cases, those that are as much as possible different from the general cases, thus reducing their quality.​

D_dims.​
The network dimensions of the decoder, which are responsible for generating the image from the code obtained from the brain of the network. When these dimensions are not enough, and the weekend faces are too different in color, lighting, etc., you have to sacrifice the maximum allowed sharpness.​

D_mask_dims.​
Dimensions of the mask decoder network, which are responsible for forming the mask image. ​
16-22 is the normal value for a fake without an edited mask in XSeg editor.​

At the moment there is no experimentally proven data that would indicate which values are better. All we know is that if you put really low values, the error curve will reach the plateau quickly enough and the face will not reach clarity.​

Masked_training. (only for whole_face).​
Enabled (default) - trains only the area inside the face mask, and anything outside that area is ignored. Allows the net to focus on the face only, thus speeding up facial training and facial expressions. ​
When the face is sufficiently trained, you can disable this option, then everything outside the face - the forehead, part of the hair, background - will be trained.​

Eyes_prio.​
Set a higher priority for image reconstruction in the eye area. Thus improving the generalization and comparison of the eyes of two faces. Increases iteration time.​

Lr_dropout.​
Include only when the face is already sufficiently trained. Enhance facial detail and improve subpixel facial transitions to reduce shake.​
Spends more video memory. So when selecting a network configuration for your graphics card, consider enabling this option.​

Random_warp.​
Turn it off only when your face is already sufficiently trained. Allows you to improve facial detail and subpixel transitions of facial features, reducing shake.​

GAN_power. ​
Allows for improved facial detail. Include only when the face is already sufficiently trained. Requires more memory, greatly increases iteration time.  ​
The work is based on the generative and adversarial principle. At first, you will see artifacts in areas that do not match the clarity of the target image, such as teeth, eye edges, etc. So train long enough. ​

True_face_power.​
Experimental option. You don't have to turn it on. Adjusts the predicted face to src in the most "hard way". Artifacts and incorrect light transfer from dst may appear.​

Face_style_power.​

Adjusts the color distribution of the predicted face in the area inside the mask to dst. Artefacts may appear. The face may become more like dst. The model may collapse.​
Start at 0.0001 and watch the changes in preview_history, turn on the backup every hour.​

Bg_style_power.​

Trains the area in the predicted face outside the face mask to be equal to the same area in the dst face. In this way the predicted face is similar to the morph in dst face with already less recognizable facial src features. ​

The Face_style_power and Bg_style_power must work in pairs to make the complexion fit to dst and the background take from dst. Morph allows you to get rid of many problems with color and face matching, but at the expense of recognition in it src face.​

ct_mode.​

It is used to fit the average color distribution of a face set src to dst. Unlike Face_style_power is a safer way, but not the fact that you get an identical color transfer. Try each one, look at the preview history which one is closer to dst and train on it.​

Clipgrad. ​

It reduces the chance of a model collapse to almost zero. Model collapse occurs when artifacts appear or when the windows of the predicted faces are colored in the same color. Model collapse can occur when using some options or when there is not enough variety of face sets dst.​
Therefore, it is best to use autobackup every 2-4 hours, and if collapse occurs, roll back and turn on clipgrad. .​

Pretrain. ​

Engage model pre-training. Performed by 24 thousand people prepared in advance. Using the pre-trained model you accelerate the training of any fairy. ​
It is recommended to train as long as possible. 1-2 days is good. 2 weeks is perfect. At the end of the pre-training, save the model files for later use. Switch off the option and train as usual.​
You can and should share your pre-trained model in the community.​

Size of src and dst face set.​

The problem with a large number of src images is repetitive faces, which will play little role. Therefore, faces with rare angles will train less frequently, which has a bad effect on quality. Therefore, 3000-4000 faces are optimal for src facial recruitment. If you have more than 5000 faces, sort by best into fewer faces. Sorting will select from the optimal ratio of angles and color variety.​

The same logic is true for dst. But dst is footage from video, each of which must be well trained to be identified by the neural network when it is closer. So if you have too many faces in dst, from 3000 and more, it is optimal to make their backup, then sort by best in 3000, train the network to say 100.000 iterations, then return the original number of dst faces and train further until the optimal result is achieved. ​

How to get lighting similar to dst face?​

It's about lighting, not color matching. It's just about collecting a more diverse src set of faces.​


How to suppress color flickering in DF model? ​

If the src set of faces contains a variety of make-up, it can lead to color shimmering DF model. Option: At the end of your training, leave at least 150 faces of the same makeup and train for several hours.​

How else can you adjust the color of the predicted face to dst?​

If nothing fits automatically, use the video editor and glue the faces in it. With the video editor, you get a lot more freedom to note colors.​

How to make a face look more like src? ​

1. Use DF architecture. ​

2. Use a similar face shape in dst.​

[align=left]3 It is known that a large color variety of facial src decreases facial resemblance, because a neural network essentially interpolates the face from what it has seen.​

For example, in your src set of faces from 7 different color scenes, and the sum of faces is only 1500, so under each dst scene will be used 1500 / 7 faces, which is 7 times poorer than if you use 1500 faces of one scene. As a result, the predicted face will be very different from the src. ​

Microquake the predicted face in the end video. ​

The higher the resolution of the model, the longer it needs to be trained to suppress the micro-shake.​
You should also enable lr_dropout and disable random_warp after 200-300k iterations at batch_size 8.​
It is not rare that the microshake can appear if the dst video is too clear. It is difficult for a neural network to distinguish unambiguous information about a face when it is overflowed with micro-pixel noise. Therefore, after extracting frames from dst video, before extracting faces, you can pass through the frames with the noise filter denoise data_dst images.bat. This filter will remove temporal noise.​
Also, ae_dims magnification may suppress the microshock.​

Use a quick model to check the generalization of facial features. ​

If you're thinking of a higher resolution fake, start by running at least a few hours at resolution 96. This will help identify facial generalization problems and correct facial sets. ​
Examples of such problems: ​

1. Non-closing eyes/mouth - no closed eyes/mouth in src.​

2. wrong face rotation - not enough faces with different turns in both src and dst face sets.​
[/align]

Training algorithm for achieving high definition.​

1. use -ud model​
2. train, say, up to 300k.​
3. enable learning rate dropout for 100k ​
4. disable random warp for 50k.​
5. enable gan​

Do not use training GPU for video output.​

This can reduce performance, reduce the amount of free GPU video memory, and in some cases lead to OOM errors.​
Buy a second cheap video card such as GT 730 or a similar, use it for video output.​
There is also an option to use the built-in GPU in Intel processors. To do this, activate it in BIOS, install drivers, connect the monitor to the motherboard.​

Using Multi-GPU. ​

Multi-GPU can improve the quality of the fake. In some cases, it can also accelerate training. ​
Choose identical GPU models, otherwise the fast model will wait for the slow model, thus you will not get the acceleration.​
Working Principle: batch_size is divided into each GPU. Accordingly, you either get the acceleration due to the fact that less work is allocated to each GPU, or you increase batch_size by the number of GPUs, increasing the quality of the fairy.​
In some cases, disabling the model_opts_on_gpu can speed up your training when using 4 or more GPUs.​
As the number of samples increases, the load on the CPU to generate samples increases. Therefore it is recommended to use the latest generation CPU and memory.​

NVLink, SLI mot working and not used. Moreover, the SLI enabled may cause errors.​

Factors that reduce fairy success. ​
1. Big face in the frame.​

2. Side lights. Transitions lighting. Color lighting.​

3. not a diverse set of dst faces. ​

For example, you train a faceake, where the whole set of dst faces is a one-way turned head. Generating faces in this case can be bad. The solution: extract additional faces of the same actor, train them well enough, then leave only the target faces in dst.​

Factors that increase the success of the fairy.​

1. Variety of src faces: different angles including side faces. Variety of lighting.​

Other.​

In 2018, when fairies first appeared, people liked any lousy quality of fairies, where the face glimpsed, and was barely like a target celebrity. Now, even in a technically perfect replacement using a parodist similar to the target celebrity, the viral video effect may not be present at all. Popular youtube channels specializing in dipfeikas are constantly inventing something new to keep the audience. If you have watched and watched a lot of movies, know all the memo videos, you can probably come up with great ideas for dipfeik. A good idea is 50% success. The technical quality can be increased through practice.​

Not all celebrity couples can be well used for a dipfeike. If the size of the skulls is significantly different, the similarity of the result will be extremely low. With experience dipfeik should understand what will be good fairies and what not.​


Deepfake tutorial XSeg + Whole Face:​

[align=left][video=youtube]
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
- reserved for future use -
 
Thank you so very much for the updated guide and tutorial - now i understand a lot more and have been able to alter my training accordingly.
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
androsk said:
is there any other method for download beside MEGA?

No, mega is the only way right now.
 

Putin_v

DF Pleb
Verified Video Creator
I am having trouble with pretraining on dfl 2.0. I enable pretraining in SAE and let it run i noticed before that it would stop after a certain ammount of iterations but now it keeps going way past 100,000. i have reinstalled the program multiple times but nothing seems to help. Am i mssing something.
 

Groggy4

NotSure
Verified Video Creator
Putin_v said:
I am having trouble with pretraining on dfl 2.0. I enable pretraining in SAE and let it run i noticed before that it would stop after a certain ammount of iterations but now it keeps going way past 100,000. i have reinstalled the program multiple times but nothing seems to help. Am i mssing something.

It won't stop unless you set a targeted iteration limit.
 

Putin_v

DF Pleb
Verified Video Creator
Groggy4 said:
Putin_v said:
I am having trouble with pretraining on dfl 2.0. I enable pretraining in SAE and let it run i noticed before that it would stop after a certain ammount of iterations but now it keeps going way past 100,000. i have reinstalled the program multiple times but nothing seems to help. Am i mssing something.

It won't stop unless you set a targeted iteration limit.

I tried that. i set a limit and when the limit was done i turned pretraning off. All of the itirations were gone. it started back from 0.
 

Groggy4

NotSure
Verified Video Creator
Putin_v said:
Groggy4 said:
Putin_v said:
I am having trouble with pretraining on dfl 2.0. I enable pretraining in SAE and let it run i noticed before that it would stop after a certain ammount of iterations but now it keeps going way past 100,000. i have reinstalled the program multiple times but nothing seems to help. Am i mssing something.

It won't stop unless you set a targeted iteration limit.

I tried that. i set a limit and when the limit was done i turned pretraning off. All of the itirations were gone. it started back from 0.

It's supposed to work like that. The training data is still there, but to avoid a morphing effect from previously faces, it resets some aspects.
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
you don't download DFL base .bat files from github, that's for getting updated files, to get actual DFL with all the .bat files download it from mega.nz link.
 

Weapon2057

DF Vagrant
For the DST do the Aligned and De bugged aligned work together? Can I delete photos that I don't like about the DST face like blurry, cut off etc or will it affect me de bug aligned and I will be missing a face for every frame that blurry face was assosiated with?
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
Weapon2057 said:
For the DST do the Aligned and De bugged aligned work together?  Can I delete photos that I don't like about the DST face like blurry, cut off etc or will it affect me de bug aligned and I will be missing a face for every frame that blurry face was assosiated with?

Read the guide again and then do this one: https://mrdeepfakes.com/forums/thre...set-creation-how-to-create-celebrity-facesets
aligned_debug is for checking landmarks only, it isn't used in training. Keep the blurry ones (in the aligned) or else it won't swap, same goes for cut off, use algined_debug to see if they are correctly aligned, if not delete that frame from debug and run 5) data_dst faceset MANUAL RE-EXTRACT DELETED ALIGNED_DEBUG to reextract it manually.
 

JohnNotStamos

DF Vagrant
For merging, since there's 2 option. interactive and non interactive, what settings would best be used when using non interactive? I'm asking because I spent 10 hours across 2 days using interactive so I prefer finding an outcome that doesn't involve me manually doing it for multiple hours.
 
Top