MrDeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
   
tutsmybarreh[GUIDE] CELEBRITY FACESET/DATASET CREATION - How to create celebrity facesets.
#1
DATASET/FACESET GUIDE FOR REQUESTERS/NON-CREATORS.

Do you want to request a specific deepfake? Do you want it to be actually completed?

Help creators by preparing a dataset/faceset of your celebrity.

NOTE FOR USERS WHO ARE UNABLE TO USE DFL OR ANY OTHER FACE SWAP SOFTWARE!
You can still make a dataset by just collecting good quality photos and videos and then extracting just the frames from video, it will produce higher size zip file to upload but is still more helpful for creators than nothing, you can extract frames from videos by using the same software DFL uses - FFMPEG or any video editing software.

Quote:FFMPEG: You are not allowed to view links. Register or Login to view.

Before you start here is some terminology:

- data_src, src, source, celebrity faceset, celebrity dataset, source dataset, source images - images of the celebrity that are used in training the AI model.
- frame,frames - self explanatory, just individual frames extracted from video, located inside workspace/data_src.
- faces - aligned pictures of faces, located inside workspace/data_src/aligned

To create a source faceset you will need software that is used to create deepfakes, we recommend DFL
Download here: You are not allowed to view links. Register or Login to view.
Choose a version for your hardware:

DeepFaceLabCUDA9.2SSE - for NVIDIA video cards up to GTX 1080 Ti
DeepFaceLabCUDA10.1AVX - for RTX NVIDIA video cards with a CPU that supports AVX
DeepFaceLabOpenCLSSE - for AMD/IntelHD cards plus any 64-bit CPU

After downloading it just unpack it and you are pretty much ready to go.

To create a good quality source dataset you need:

- it should cover all or at least most of possible face/head angles - looking up, down, left, right, straight at camera and everything in between), the best way to achieve it is to use more than one interview or grab clips from movies instead of relying on single interview (which will mostly feature one angle and some small variations).

- it needs to be consistent - you don't want blurry, low resolution and compressed faces next to crisp, sharp and high quality ones.

- it must be high quality - don't bother using blurry faces, try to only use at least 720p (and sharp) videos and high quality pictures.

- lighting should be consistent - some small shadows are OK but you shouldn't include interviews with harsh, directional lighting, if possible try to use only those where shadows are fairly light, light comes from the camera or is diffused.

- it should cover all different facial expressions - that includes open/closed mouths, open/closed eyes, smiles, frowns, eyes looking in different directions - the more variety in expressions you can get the better results will be.

- if you are using only pictures or they are majority of the dataset - make sure they fill all the checks as mentioned above, 20 pictures with bunch of instagram filters is not enough.

After you've collected all videos:

1. Extract individual frames from videos.

Following instructions are for when you have a couple videos/interviews to extract, if you only have a single video and some pictures, skip to step 2. Align faces.
If you have few videos you can either edit them in video editing software into one data_src.mp4 file and then skip to step 2. Align faces.

The other way is to name each one data_src.mp4 and extract them one by one.

Start by going into workspace folder and delete everything inside.
Next copy one video into workspace folder and name it data_src, next run "2) extract images from video data_src" when asked about png or jpg write jpg and press enter - this will create a folder called dat_src and inside there will be all extracted FRAMES from that video in jpg format (save space, not much quality improvement from extracting to png).
Select them all with a shortcut ctrl + a and press F2 to batch rename them to a number for example "src1", that is so when we extract again from other video it doesn't get overwritten.

After doing that delete data_src.mp4 file from the folder and copy over next one, name it data_src, extract, select all files, ctrl + a, F2, rename to something else (can be the same thing or something else like "src2" but it doesn't matter, just name them all to something that isn't the same as extractor output. Repeat until all videos are extracted.

2. Align faces.

After you've extracted all videos or combined them into one in video editing software and did the same you can copy all your pictures into data_src folder and run face alignment "4) data_src extract faces S3FD best GPU" - this will run a python script/program taht will create a new folder called ALIGNED in your DATA_SRC and align faces visible in photos and video frames you extracted in step 1. This process is not perfect and requires the so called "dataset cleanup". This is a process where you go through all the aligned faces and delete all blurry, to dark and to bright faces as well as ones that don't belong to the celebrity of which you are making faceset (script will detect ALL FACES from your extracted frames/photos).

To help you in this DFL comes with a bunch of sorting scripts, start from "4.2.2) data_src sort by similar histogram" - this will sort them based on histogram so it will generally group all similar ones together, this will help most with deleting all the bad aligned frames as well as those that are to dark, to bright, low contrast, as well as other peoples faces. It is the most useful one and you should be able to cleanup your faceset easily just with it.

Faces to delete besides which I mentioned above also include deleting all the partially cut ones (some can be leaved) as well as ones that are rotated. Examples of faces to delete:

Quote:You are not allowed to view links. Register or Login to view.
Green - good faces/alignments.
Red - misaligned, you can see they are rotated slightly, you definitely don't want these in your dataset.

In other colors I included various other types you should get rid of:


Blue - obstructions/overlays/text over face - can be left if it's only a small obstruction on few faces in the whole dataset.
Yellow - blurry faces.
Violet - faces of other people.
Pink - cut off faces, they can be left if it's only a bit of chin or forehead or it doesn't go over the face, assuming it's only few faces in the entire dataset.
Orange - to dark/bright/overexposed/low contrast faces - also any pictures with heavy photo/instagram filters, etc.

After that you will have a clean dataset, all you now need to do is to zip that "aligned" folder and upload it to mega/google drive and post link in a request thread so creators have easier job fulfilling your requests Smile

Happy Deepfaking and good luck making your facesets!

If you still have question how to do it, consult our DFL guide:
You are not allowed to view links. Register or Login to view.

FAQ:

@666VR999 asked:

Q: "Thanks for this helpful summary. I often see overkill on number of frames and have discussed in other threads the optimal minimum, what are the best tools for automating the process of removing duplicates beyond the DFL bat files to shrink to "minimum frames, maximum variation" as you might call it? i.e. you've checked all your frames are good quality, but too many the same? I would still rather feed in too many frames at the start and delete after, than extract at a lower FPS and risk missing useful variation."

A: There are probably some tools that could try to detect similar faces and delete them. I just do this manually. Also If it's under 5000 thousand I'd just leave as it is, unless you have 10k-15k of pictures I wouldn't bother about deleting similar duplicate ones, it won't hurt anything and some of those similar frames may just feature some unique detail that may help achieve better results.
App suggested by @666VR999 for further dataset cleanup: You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view.

Q: "What to do if you trained with a celebtity faceset and then decided to add new pictures of the same celeb to it?"

A: "Safest way is to change the name of the entire "data_src" folder to anything else or temporarily moving it somewhere else, then just extract frames from new data_src.mp4 file or if you already have the frames extracted and some pictures ready, create a new folder data_src, copy them inside it and run data_src extraction/aligning process, then just copy aligned images from the old data_src/aligned folder into the new one and upon being asked by windows to replace or skip, select the option to rename files so you keep all of them and not end up replacing old ones with new ones".

@THG2222 asked:

Q: "Thank you for this guide! I've noticed this guide is only about src faceset. Does the dst faceset also need to be sharp and high quality? Or can they be a bit blurry and shadowy?"

A: "Blurry faces in dst will cause a couple issues:

- first is that some of the faces in certain frames will not get detected - this will cause original faces to be shown on these frames when converting/merging.
- second is that other may be incorrectly aligned - this will cause final faces on this frames to be rotated/blurry and just look all wrong.
- third - even with manual aligning in some cases it may not be possible to correctly detect/align faces which again - will cause original faces to be visible on frames that were to blurry or contained motion blur.
- faces that contain motion blur or are blurry (not sharp) that are correctly aligned may still produce bad results because the models that are used in training cannot understand motion blur, certain parts of the face like mouth when blurred out may appear bigger/wider or just different and the model (H128/SAE, really any training model) will interpret this as a change of the shape/look of that part and thus both the predicted and the final faked face will look unnatural.

That's why you want both your SRC datasets and DST datasets to be as sharp looking as possible.
Small amount of blurriness on some frames shouldn't cause many issues. As for shadows, this depends on how much shadow we are talking about, small, light shadows will probably not be visible, you can get good results with shadows on faces but to much will also look bad, you want your faces to be lit as evenly as possible with as little of harsh/sharp and dark shadows as possible."
Raising money for a new GPU, if you enjoy my fakes or my work on forums consider donating via bitcoin, tokens or paypal/patreon, any amount helps!
Paypal/Patreon: You are not allowed to view links. Register or Login to view.
Bitcoin: 1C3dq9zF2DhXKeu969EYmP9UTvHobKKNKF
Want to request a paid deepfake or have any questions reagarding the forums or deepfake creation using DeepFaceLab? Write me a message.
TMB-DF on the main website - You are not allowed to view links. Register or Login to view.
#2
yes I have a question to ask when I go to insert the video to cut in the cut file video closes automatically. what am I doing wrong?
#3
(12-15-2019, 11:41 AM)GhostRounder Wrote: You are not allowed to view links. Register or Login to view.yes I have a question to ask when I go to insert the video to cut in the cut file video closes automatically. what am I doing wrong?

PM me.
Raising money for a new GPU, if you enjoy my fakes or my work on forums consider donating via bitcoin, tokens or paypal/patreon, any amount helps!
Paypal/Patreon: You are not allowed to view links. Register or Login to view.
Bitcoin: 1C3dq9zF2DhXKeu969EYmP9UTvHobKKNKF
Want to request a paid deepfake or have any questions reagarding the forums or deepfake creation using DeepFaceLab? Write me a message.
TMB-DF on the main website - You are not allowed to view links. Register or Login to view.

Forum Jump:

Users browsing this thread: 1 Guest(s)