MrDeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
PocketspeedThis Ain't Like Photoshop: When you request a deepfake, here's what happens...
With an increase in the number of recent members, I wanted to explain a bit of the process of deepfake creation.  Helping the average site user understand the process will help everyone.  I won't discuss the hardware requirements.  Now this is a basic explanation, and there are ways to make this process a little easier.  But for the average member, this will be a fair explanation of how involved and time consuming deepfake creation is.  There are 3 basic steps.

Step 1

When a creator starts, they have an idea about the video they want to make.  The process is the same for a NSFW fake and a SFW fake. Let's start with our target video; we'll call this video A.  Video A is a clip of someone whose face we will be replacing.  Now we need either another video or set of images.  We'll call this video B.  Video B is a video (or imageset, but video is better) of a person we want to put into video A.  Now we extract both videos into individual frames.  This takes some time, but not too much.  A 30fps, 3 minute video will have 5400 frames.  If video A and B are identical in length, we now have 10800 frames.

Now we take the frames from video A, and we detect and extract the faces from them.  We can have more or less faces than the number frames from video A.  If there are 2 actors in video A, we can potentially have 10800 faces just from video A itself.  We can also have false face detections (called false positives), which can increase the number of faces even more.  So for a 3 minute clip with 2 actors, we will probably end up with around 11000 face pictures.

Now we take the frames from video B, and we detect and extract the faces from them.  This time we are lucky, and video B only has one person in it.  This is the person we will be putting into our end result video.  So for a 3 minute 30fps video B, we get 5400 faces.  But we also have some false positives here too, so let's say we end up with 5500 face pictures.

We now have a total of 16500 face pictures.  And we have to go through them all.  Every. Single. One.  We pick out the good ones from the bad ones.  We get rid of the second actor's face pictures from video A.  This takes some time.  This is how we create our data for the training process.  We will probably spend 2-4 hours, or more, on Step 1.  Let's say we spend 3 hours on step 1.  When we are done, this is the end of step 1.

Step 2

We are now ready to train.  We use our application to begin the training process. The time spent training depends on the creator's hardware. More powerful graphics cards with higher VRAM take less time.  Less powerful cards take more time.  There is no set time for training.  It could take a few hours, or it could take one day, two days, or even three.  When we are satisfied with our training results, we end training.  We will guess and say total training time is 12 hours. This is the end of step 2.

Step 3

Now we will swap our faces.  We use a merging or converting process to take our trained model, and swap the faces we want into every single frame from video A.  All 5400 of them. Our software helps us and this goes quicker than the steps above, but still takes time.  Once we have all 5400 pictures ready, we then put them back into video format. We have tools to make this happen quicker than the steps above.  But it step 3 could take 1-3 hours, or more, or less.  Let's say we've spent 1 hour on step 3, because our video is only 3 minutes long.  Now if we are lucky, our audio matches up nicely with our clip.  But sometimes it doesn't.  So we spend another 30 minutes to get the audio to sync correctly.  Now we will spend another 30 to 120 minutes post processing.  We cut out sections of bad faceswaps, or generally poor results.  So we'll say it takes 30 more minutes for post processing.  We have now spent 2 hours on step 3, and we upload our video to  This is the end of step 3.

So as you can tell, we have spent a total of 17 hours to make a 3 minute, 30 fps deepfake.  And this time, every thing went very smoothly.  For some deepfakes, more or less time will be spent on creation.

This is not a thread calling for accolades for deepfake creators.  I mean, we're doing this by choice.  The purpose of this thread is to demonstrate what goes into deepfake creation.  It is fairly apparent that many of the newer site members don't know how much work goes into deepfake creation.  I'm just trying to help people realize what they are asking for, sometimes.
I've tried it and it's definitely a complicated process. Amazing the results some of you guys can get.
I tried this, and it's a bit of a headache for me, add to that, I don't have a powerful PC to work with. So I agree, that deepfakers spend a lot of time doing this.

Kudos and much respect deepfakers!

Forum Jump:

Users browsing this thread: 1 Guest(s)