The Perfect "Efficient" Faceset

666VR999 · Aug 2, 2019

Has anyone figured out what this is yet? I still see people saying different things so I don't know if it's still a bit of an experimental "well this way seems to work for me so I'm sticking with it" - without realizing maybe 50% of the faceset doesn't make any positive contribution to training just slows it down being processed. I can't find it now but someone on here said they'd made a deepfake from only 40 or 60 images? but I figure most people still grab several thousand.

So 40-60 is probably too small, but 5000 probably too big - it's almost like you need footage from when they filmed the matrix with the 360 camera rig around the actor - and could say harvest X images from angle X to Y and repeat whilst they did a different facial expression you could get to the critical say 500 images that the algorithms could then get all they need from it, anything beyond that doesn't add value.

I notice in another thread Samsung supposedly did one from 1 image, so we must be at a point we don't need such big facesets, where is the point of diminishing returns if we can define the "essential" base set of angles/expressions.

LCC · Aug 2, 2019

I notice in another thread Samsung supposedly did one from 1 image

The technology is still new, and works differently. Pretty sure it worked to find people with similar bone structure from picture database. Then used different angles of that person, and combined them with the original source photo and the destination video. (Although Pre-Training can achieve a similar effect)

40-60 is probably too small, but 5000 probably too big

Well, we could try work this out (Someone better at Deepfaking or Maths feel free to correct or add to this).

Lets say you want every type of picture for every 5 degrees of head rotation, both yaw and pitch.
120-180 degrees total.

Eyes: Blinking, Half Blinking, Up, Down, Left, Right, Center
Mouth: Smiling No Teeth, Smiling Teeth, Closed, Oooh Shape, Aaah Shape
Eyebrows: Up, Down
= 14

14 x (120/5) x (120/5) = 8064 images
14 x (180/5) x (180/5) = 18,144 images

= Great face set for all angles / expressions

Now you could definitely cut that down with a bit of work. E.g. If you're looking at them from above, you dont need all those mouth shapes. So only 100 degrees in pitch
The software can also take care of some gaps too E.g. SAE's Styles
Then you also have images which tick multiple boxes E.g. Blinking, Smiling, Eyebrows Up

Surely it's gonna come down to your destination video though? If it's someone staring at a camera for 30 seconds head on, you don't need 8000 images. But if it's POV sex on a trampoline...

deepfakery · Aug 2, 2019

I think if you're looking to create a generic, multi-use faceset for a given person then it is best to grab as many angles as possible.
I recently (accidentally) ripped a 60fps video and essentailly ended up with double the fidelity of face angles, but twice as many files.

However you should only need those angles which closely match your DST faceset, so doing a side by side comparison to remove unneeded angles is a good way to tighten up the faceset.

I'd start with sort by yaw and remove unused angles, making sure to keep an eye on the pitch as I delete. Be conservative at first, don't delete too many.

Sometime I do a test training with feed by yaw on to see which alignments have good matches and which don't right off the bat. Then I can choose more faces to be deleted, added back in, or found from a different source.

Ultimately the number of faces used is subjective and depends on how much you want to train and what your expected outcome quality is.

666VR999 · Aug 7, 2019

Thanks for sharing your thoughts.....I'm now off to delete all my POV trampoline sex videos

The Perfect "Efficient" Faceset

666VR999

DF Enthusiast

LCC

DF Pleb

deepfakery

DF Pleb

666VR999

DF Enthusiast