MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We are looking for community members who are intested in helping out. See our HELP WANTED post.

SAE Pro Tips?

mondomonger

DF Admirer
Verified Video Creator
I really enjoyed @tania01's recent Lucy Lawless SAE videos.   I've tried SAE in the past, but keep going back to H128.  Now have decided to give SAE another chance since DFL development has halted and I know my SAE models will not be rendered useless by the whims of a mercurial programmer.   ; )

I've made 500+ H128 videos, but only about 4 SAE vids, so I'm a near total newbie on SAE.


1) Do most people use the defaults when starting SAE training?
2) Does anyone fiddle with the style power settings through training?  Is it worth it, or is it like Pixel training and mostly worthless?
3)  Can a mature 100k or 200k epoch SAE model be recycled completely to a new celeb or pornstar?
4) Can you partially train an SAE model to say 50k, then use that as your starting model for all future SAE models?
5) At what point does the SAE model break down?  A certain range of epochs, or just totally at random?
6)  What are the best conversion settings?     I made several FAN-DST conversions, but in my experience it took 4x longer than my usual H128 conversions.    I've seen advice to use FAN-X.   What are the pros/cons and what do most people use?
7)  When using the April 2nd version of DFL and using FAN-DST for a gokkun video:      a) Half the dick sometimes disappears or falls out of frames.     b)  The face replaced goes insanely cross eyed...a lot.    Was this fixed in later versions of DFL?
8) I'm following tani01's method of training Celeb-Celeb for 200k epochs for the base SAE model that becomes the master model.   I'll then train that master model to a specific pornstar clip for around 10k epochs.    Is this method commonly used?  Pros/Cons?


I appreciate any and all feedback.   I can assure you that your advice will be put into 100s of hours of deepfaking work.   Cheers!
 

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
1) You can see my personal workflow in the guide
2) I still train new models with 10/10 style power, then reduce to 0.1/0.1 after 40k iterations
3) Yes, but I have only done it for very similar head/features.
4) Yes, but she the model will likely look less like the celebrity
5) If you're talking about collapsing, it rarely does, but it can when using higher style power or pixel loss
6) I use option 6 which is a combined version of learned and FAN-X. Converting with SAE will likely be longer than H128.
7) I think it depends on the angle, you can check some of my videos to see if the issue is still there.
8) I use a base model for each celebrity I have. Sure you can try what other people do, but you can come up with what works for you.
 

mondomonger

DF Admirer
Verified Video Creator
dpfks -

Thanks for the advice.  I also just took another look at your workflow post which is also helpful. How many epochs do you generally train a new SAE celebrity model?
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?
 

VirginBoI

DF Pleb
tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

448000 iterations ? Do you tain your model for 2-3 weeks ? It takes at least 2 days continuous to train model in SAE with 176 resolution and 4 batch size  to reach 100000 iterations .And why you use DF when LIAEF is more stable and advanced. SAE performs better than H128 in every aspect.
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
VirginBoI said:
tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

448000 iterations ? Do you tain your model for 2-3 weeks ? It takes at least 2 days continuous to train model in SAE with 176 resolution and 4 batch size  to reach 100000 iterations .And why you use DF when LIAEF is more stable and advanced. SAE performs better than H128 in every aspect.

448000 is the amount of iterations in H128 model that I'm currently using, it was reused so far around 4 times after the first training.

I'm asking because so far my results with SAE (that was trained well over 100 000) resulted in unrecognisable mess, basically whenever the face was even slightly tilted it would get really small and distorted and I wasn't sure if it was due to power settings or choosen architecture.

It could have been conversion too but I didn't check all of them just the ones that were recommended

It was one use only (with DF model) and I concluded that it must be something wrong with settings because dataset was good (no misaligned faces, blurry photos, etc) and H128 did a perfect job. Therefore in future I will only use SAE with LIAEF architecture which as you say is more stable.

The question is how and when to adjust those power settings and how many times can the model be reused before having to scrap it (I assume 100k is how much it should be pretrained and then I should always use that 100k iteration model for similar faces and only train on it 2-3 times (unless the same face is being used) and after 200-300k I should start again from that 100k for similar faces and train a new one for more different looking face?).

Also what kind of resolution is 176? I could have understand something in between 128-256 like 192 but 176? You sure its that resolution?
 

VirginBoI

DF Pleb
tutsmybarreh said:
VirginBoI said:
tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

448000 iterations ? Do you tain your model for 2-3 weeks ? It takes at least 2 days continuous to train model in SAE with 176 resolution and 4 batch size  to reach 100000 iterations .And why you use DF when LIAEF is more stable and advanced. SAE performs better than H128 in every aspect.

448000 is the amount of iterations in H128 model that I'm currently using, it was reused so far around 4 times after the first training.

I'm asking because so far my results with SAE (that was trained well over 100 000) resulted in unrecognisable mess, basically whenever the face was even slightly tilted it would get really small and distorted and I wasn't sure if it was due to power settings or choosen architecture.

It could have been conversion too but I didn't check all of them just the ones that were recommended

It was one use only (with DF model) and I concluded that it must be something wrong with settings because dataset was good (no misaligned faces, blurry photos, etc) and H128 did a perfect job. Therefore in future I will only use SAE with LIAEF architecture which as you say is more stable.

The question is how and when to adjust those power settings and how many times can the model be reused before having to scrap it (I assume 100k is how much it should be pretrained and then I should always use that 100k iteration model for similar faces and only train on it 2-3 times (unless the same face is being used) and after 200-300k I should start again from that 100k for similar faces and train a new one for more different looking face?).

Also what kind of resolution is 176? I could have understand something in between 128-256 like 192 but 176? You sure its that resolution?

See my pewdiepie video as captain america. I have trained it with SAE with given configurations around 108000 iterations.

if satisfactory work on  SAE accordingly

== Model options:

== |== batch_size : 4
== |== sort_by_yaw : True
== |== random_flip : True
== |== resolution : 176
== |== face_type : f
== |== learn_mask : False
== |== optimizer_mode : 1
== |== archi : liae
== |== ae_dims : 256
== |== e_ch_dims : 42
== |== d_ch_dims : 21
== |== multiscale_decoder : True
== |== ca_weights : True
== |== pixel_loss : True
== |== face_style_power : 0.0
== |== bg_style_power : 0.0
== |== apply_random_ct : True
== Running on:
== |== [0 : Tesla T4]

[video=vimeo]http://https://vimeo.com/342538029[/video]
 

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
mondomonger said:
dpfks -

Thanks for the advice.  I also just took another look at your workflow post which is also helpful. How many epochs do you generally train a new SAE celebrity model?

From a new model, I train at least 130-140k iterations. When reusing them you can train less and less additional iterations after.

tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

I mostly only use DF because LIAF will morph the faces, making it look less than data_src. Style power will make your model look more like data_dst. No one can really tell you the limits of reusing models.

tutsmybarreh said:
VirginBoI said:
tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

448000 iterations ? Do you tain your model for 2-3 weeks ? It takes at least 2 days continuous to train model in SAE with 176 resolution and 4 batch size  to reach 100000 iterations .And why you use DF when LIAEF is more stable and advanced. SAE performs better than H128 in every aspect.

448000 is the amount of iterations in H128 model that I'm currently using, it was reused so far around 4 times after the first training.

I'm asking because so far my results with SAE (that was trained well over 100 000) resulted in unrecognisable mess, basically whenever the face was even slightly tilted it would get really small and distorted and I wasn't sure if it was due to power settings or choosen architecture.

It could have been conversion too but I didn't check all of them just the ones that were recommended

It was one use only (with DF model) and I concluded that it must be something wrong with settings because dataset was good (no misaligned faces, blurry photos, etc) and H128 did a perfect job. Therefore in future I will only use SAE with LIAEF architecture which as you say is more stable.

The question is how and when to adjust those power settings and how many times can the model be reused before having to scrap it (I assume 100k is how much it should be pretrained and then I should always use that 100k iteration model for similar faces and only train on it 2-3 times (unless the same face is being used) and after 200-300k I should start again from that 100k for similar faces and train a new one for more different looking face?).

Also what kind of resolution is 176? I could have understand something in between 128-256 like 192 but 176? You sure its that resolution?

A small jump in resolution may mean a long time of extra training.
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
VirginBoI said:
See my pewdiepie video as captain america. I have trained it with SAE with given configurations around 108000 iterations.

if satisfactory work on  SAE accordingly

== Model options:

== |== batch_size : 4
== |== sort_by_yaw : True
== |== random_flip : True
== |== resolution : 176
== |== face_type : f
== |== learn_mask : False
== |== optimizer_mode : 1
== |== archi : liae
== |== ae_dims : 256
== |== e_ch_dims : 42
== |== d_ch_dims : 21
== |== multiscale_decoder : True
== |== ca_weights : True
== |== pixel_loss : True
== |== face_style_power : 0.0
== |== bg_style_power : 0.0
== |== apply_random_ct : True
== Running on:
== |== [0 : Tesla T4]

[video=vimeo]http://https://vimeo.com/342538029[/video]

I noticed that you have CA weights and pixel loss turned on, do you keep those throughout entire training, do you experience frequent model collapse and how does it improve quality for you versus having it off or turning it only at the very end as it is usually recommended?

BTW was tania01's Lucy Lawless video done with LIAE or DF? Did he share his setting somewhere? I might also do yours dpfsk separately and use the same dataset I have now and compare results (I started with pretraining so it used those built in pictures up to 10k but it doesn't look like it did much because now I have added my own random faces (which I'll train up to 30-50k and then put proper face dataset up to 100k ish) and it still looked like it started from scratch so I'm not sure if it's better to use actual pretraining feature or just give it like 10k of my own random pictures (5k in src, different 5k in dst), my settings are:

Write preview: true
Batch size: 8
sort yaw: false
random flip: false
res: 128
face type: full
learn mask: true
opti: 3
archi: liae
ae dims 256
e ch dims 42
d ch dims 21
multiscale true
ca weights false
pixel loss false (will turn maybe for the last 10k after backup)
face style: 10
bg style: 10 (will turn both down as in dpfsk post down to 1 or maybe even 0 for that last 10k)
random ct: false
 

VirginBoI

DF Pleb
tutsmybarreh said:
VirginBoI said:
See my pewdiepie video as captain america. I have trained it with SAE with given configurations around 108000 iterations.

if satisfactory work on  SAE accordingly

== Model options:

== |== batch_size : 4
== |== sort_by_yaw : True
== |== random_flip : True
== |== resolution : 176
== |== face_type : f
== |== learn_mask : False
== |== optimizer_mode : 1
== |== archi : liae
== |== ae_dims : 256
== |== e_ch_dims : 42
== |== d_ch_dims : 21
== |== multiscale_decoder : True
== |== ca_weights : True
== |== pixel_loss : True
== |== face_style_power : 0.0
== |== bg_style_power : 0.0
== |== apply_random_ct : True
== Running on:
== |== [0 : Tesla T4]

[video=vimeo]http://https://vimeo.com/342538029[/video]

I noticed that you have CA weights and pixel loss turned on, do you keep those throughout entire training, do you experience frequent model collapse and how does it improve quality for you versus having it off or turning it only at the very end as it is usually recommended?

BTW was tania01's Lucy Lawless video done with LIAE or DF? Did he share his setting somewhere? I might also do yours dpfsk separately and use the same dataset I have now and compare results (I started with pretraining so it used those built in pictures up to 10k but it doesn't look like it did much because now I have added my own random faces (which I'll train up to 30-50k and then put proper face dataset up to 100k ish) and it still looked like it started from scratch so I'm not sure if it's better to use actual pretraining feature or just give it like 10k of my own random pictures (5k in src, different 5k in dst), my settings are:

Write preview: true
Batch size: 8
sort yaw: false
random flip: false
res: 128
face type: full
learn mask: true
opti: 3
archi: liae
ae dims 256
e ch dims 42
d ch dims 21
multiscale true
ca weights false
pixel loss false (will turn maybe for the last 10k after backup)
face style: 10
bg style: 10 (will turn both down as in dpfsk post down to 1 or maybe even 0 for that last 10k)
random ct: false

I keep CA weights turned on throughout training. It simply makes your predicted results better by making it more stable.
CA weights doesn't morph I guess not pixel loss.
I used pixel loss at the last 10k iterations.

Pixel loss is also for stability and more details like CA weights.

Only Liaef Architecture and options like pretrain and face, background style morphs your face to dst.

Also increase your resolution according to your dst.

I take 176 as it's best setting for my dpfks as you can see.


dpfks said:
mondomonger said:
dpfks -

Thanks for the advice.  I also just took another look at your workflow post which is also helpful. How many epochs do you generally train a new SAE celebrity model?

From a new model, I train at least 130-140k iterations. When reusing them you can train less and less additional iterations after.

tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

I mostly only use DF because LIAF will morph the faces, making it look less than data_src.  Style power will make your model look more like data_dst. No one can really tell you the limits of reusing models.

tutsmybarreh said:
VirginBoI said:
tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

448000 iterations ? Do you tain your model for 2-3 weeks ? It takes at least 2 days continuous to train model in SAE with 176 resolution and 4 batch size  to reach 100000 iterations .And why you use DF when LIAEF is more stable and advanced. SAE performs better than H128 in every aspect.

448000 is the amount of iterations in H128 model that I'm currently using, it was reused so far around 4 times after the first training.

I'm asking because so far my results with SAE (that was trained well over 100 000) resulted in unrecognisable mess, basically whenever the face was even slightly tilted it would get really small and distorted and I wasn't sure if it was due to power settings or choosen architecture.

It could have been conversion too but I didn't check all of them just the ones that were recommended

It was one use only (with DF model) and I concluded that it must be something wrong with settings because dataset was good (no misaligned faces, blurry photos, etc) and H128 did a perfect job. Therefore in future I will only use SAE with LIAEF architecture which as you say is more stable.

The question is how and when to adjust those power settings and how many times can the model be reused before having to scrap it (I assume 100k is how much it should be pretrained and then I should always use that 100k iteration model for similar faces and only train on it 2-3 times (unless the same face is being used) and after 200-300k I should start again from that 100k for similar faces and train a new one for more different looking face?).

Also what kind of resolution is 176? I could have understand something in between 128-256 like 192 but 176? You sure its that resolution?

A small jump in resolution may mean a long time of extra training.

Yes ...a long time but more better resolution results.
 

Endalus

DF Pleb
VirginBoI said:
tutsmybarreh said:
VirginBoI said:
See my pewdiepie video as captain america. I have trained it with SAE with given configurations around 108000 iterations.

if satisfactory work on  SAE accordingly

== Model options:

== |== batch_size : 4
== |== sort_by_yaw : True
== |== random_flip : True
== |== resolution : 176
== |== face_type : f
== |== learn_mask : False
== |== optimizer_mode : 1
== |== archi : liae
== |== ae_dims : 256
== |== e_ch_dims : 42
== |== d_ch_dims : 21
== |== multiscale_decoder : True
== |== ca_weights : True
== |== pixel_loss : True
== |== face_style_power : 0.0
== |== bg_style_power : 0.0
== |== apply_random_ct : True
== Running on:
== |== [0 : Tesla T4]

[video=vimeo]http://https://vimeo.com/342538029[/video]

I noticed that you have CA weights and pixel loss turned on, do you keep those throughout entire training, do you experience frequent model collapse and how does it improve quality for you versus having it off or turning it only at the very end as it is usually recommended?

BTW was tania01's Lucy Lawless video done with LIAE or DF? Did he share his setting somewhere? I might also do yours dpfsk separately and use the same dataset I have now and compare results (I started with pretraining so it used those built in pictures up to 10k but it doesn't look like it did much because now I have added my own random faces (which I'll train up to 30-50k and then put proper face dataset up to 100k ish) and it still looked like it started from scratch so I'm not sure if it's better to use actual pretraining feature or just give it like 10k of my own random pictures (5k in src, different 5k in dst), my settings are:

Write preview: true
Batch size: 8
sort yaw: false
random flip: false
res: 128
face type: full
learn mask: true
opti: 3
archi: liae
ae dims 256
e ch dims 42
d ch dims 21
multiscale true
ca weights false
pixel loss false (will turn maybe for the last 10k after backup)
face style: 10
bg style: 10 (will turn both down as in dpfsk post down to 1 or maybe even 0 for that last 10k)
random ct: false

I keep CA weights turned on throughout training. It simply makes your predicted results better by making it more stable.
CA weights doesn't morph I guess not pixel loss.
I used pixel loss at the last 10k iterations.

Pixel loss is also for stability and more details like CA weights.

Only Liaef Architecture and options like pretrain and face, background style morphs your face to dst.

Also increase your resolution according to your dst.

I take 176 as it's best setting for my dpfks as you can see.


dpfks said:
mondomonger said:
dpfks -

Thanks for the advice.  I also just took another look at your workflow post which is also helpful. How many epochs do you generally train a new SAE celebrity model?

From a new model, I train at least 130-140k iterations. When reusing them you can train less and less additional iterations after.

tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

I mostly only use DF because LIAF will morph the faces, making it look less than data_src.  Style power will make your model look more like data_dst. No one can really tell you the limits of reusing models.

tutsmybarreh said:
VirginBoI said:
tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

448000 iterations ? Do you tain your model for 2-3 weeks ? It takes at least 2 days continuous to train model in SAE with 176 resolution and 4 batch size  to reach 100000 iterations .And why you use DF when LIAEF is more stable and advanced. SAE performs better than H128 in every aspect.

448000 is the amount of iterations in H128 model that I'm currently using, it was reused so far around 4 times after the first training.

I'm asking because so far my results with SAE (that was trained well over 100 000) resulted in unrecognisable mess, basically whenever the face was even slightly tilted it would get really small and distorted and I wasn't sure if it was due to power settings or choosen architecture.

It could have been conversion too but I didn't check all of them just the ones that were recommended

It was one use only (with DF model) and I concluded that it must be something wrong with settings because dataset was good (no misaligned faces, blurry photos, etc) and H128 did a perfect job. Therefore in future I will only use SAE with LIAEF architecture which as you say is more stable.

The question is how and when to adjust those power settings and how many times can the model be reused before having to scrap it (I assume 100k is how much it should be pretrained and then I should always use that 100k iteration model for similar faces and only train on it 2-3 times (unless the same face is being used) and after 200-300k I should start again from that 100k for similar faces and train a new one for more different looking face?).

Also what kind of resolution is 176? I could have understand something in between 128-256 like 192 but 176? You sure its that resolution?

A small jump in resolution may mean a long time of extra training.

Yes ...a long time but more better resolution results.



Not necessarily. Increased resolution would only have a visible effect in conversion if the face in the destination video has enough pixel density so that the higher resolution model doesn't have to downscale. For 720p destination videos, which seem to be the most popular, if the height of the masked face is about 18% of the vertical height of the screen (as it would be for pretty much any full body or waist up shot), you would see literally no difference in definition. For a lower quality destination video like 480, the face would need to take up over a quarter of the screen before there'd technically be a difference, and it would probably need to be significantly larger before the naked eye would start noticing a difference.

Also there's limitations on how much the model can learn based on the quality of your SRC images. If the source faces are only crisp enough to produce good results at a 128 resolution level, increasing it further won't do anything.
 

VirginBoI

DF Pleb
Endalus said:
VirginBoI said:
tutsmybarreh said:
VirginBoI said:
See my pewdiepie video as captain america. I have trained it with SAE with given configurations around 108000 iterations.

if satisfactory work on  SAE accordingly

== Model options:

== |== batch_size : 4
== |== sort_by_yaw : True
== |== random_flip : True
== |== resolution : 176
== |== face_type : f
== |== learn_mask : False
== |== optimizer_mode : 1
== |== archi : liae
== |== ae_dims : 256
== |== e_ch_dims : 42
== |== d_ch_dims : 21
== |== multiscale_decoder : True
== |== ca_weights : True
== |== pixel_loss : True
== |== face_style_power : 0.0
== |== bg_style_power : 0.0
== |== apply_random_ct : True
== Running on:
== |== [0 : Tesla T4]

[video=vimeo]http://https://vimeo.com/342538029[/video]

I noticed that you have CA weights and pixel loss turned on, do you keep those throughout entire training, do you experience frequent model collapse and how does it improve quality for you versus having it off or turning it only at the very end as it is usually recommended?

BTW was tania01's Lucy Lawless video done with LIAE or DF? Did he share his setting somewhere? I might also do yours dpfsk separately and use the same dataset I have now and compare results (I started with pretraining so it used those built in pictures up to 10k but it doesn't look like it did much because now I have added my own random faces (which I'll train up to 30-50k and then put proper face dataset up to 100k ish) and it still looked like it started from scratch so I'm not sure if it's better to use actual pretraining feature or just give it like 10k of my own random pictures (5k in src, different 5k in dst), my settings are:

Write preview: true
Batch size: 8
sort yaw: false
random flip: false
res: 128
face type: full
learn mask: true
opti: 3
archi: liae
ae dims 256
e ch dims 42
d ch dims 21
multiscale true
ca weights false
pixel loss false (will turn maybe for the last 10k after backup)
face style: 10
bg style: 10 (will turn both down as in dpfsk post down to 1 or maybe even 0 for that last 10k)
random ct: false

I keep CA weights turned on throughout training. It simply makes your predicted results better by making it more stable.
CA weights doesn't morph I guess not pixel loss.
I used pixel loss at the last 10k iterations.

Pixel loss is also for stability and more details like CA weights.

Only Liaef Architecture and options like pretrain and face, background style morphs your face to dst.

Also increase your resolution according to your dst.

I take 176 as it's best setting for my dpfks as you can see.


dpfks said:
mondomonger said:
dpfks -

Thanks for the advice.  I also just took another look at your workflow post which is also helpful. How many epochs do you generally train a new SAE celebrity model?

From a new model, I train at least 130-140k iterations. When reusing them you can train less and less additional iterations after.

tutsmybarreh said:
I too was thinking again about starting with SAE and had similar questions as you, except I was also wondering if it's better to use DF or LIAEF for full face (at 128, optimization 2/3 due to my GPU) and how does adjusting style power affect things, especially since at 100% of both it doesn't use DF/LIAEF at all and uses something that is SAE specific and how those different models affetc each other in training at "in between" settings like 10,20,50 etc. Also regarding reusing model, with old FakeApp I was literally resuing the same model constantly and I'm doing the same now with H128 (currently it's at 448 000 iterations). So is it unwise to reuse model so much in H128 like in SAE and how to pretrain it best and at how many iterations/how many reuses are possible after?

I mostly only use DF because LIAF will morph the faces, making it look less than data_src.  Style power will make your model look more like data_dst. No one can really tell you the limits of reusing models.

tutsmybarreh said:
VirginBoI said:
448000 iterations ? Do you tain your model for 2-3 weeks ? It takes at least 2 days continuous to train model in SAE with 176 resolution and 4 batch size  to reach 100000 iterations .And why you use DF when LIAEF is more stable and advanced. SAE performs better than H128 in every aspect.

448000 is the amount of iterations in H128 model that I'm currently using, it was reused so far around 4 times after the first training.

I'm asking because so far my results with SAE (that was trained well over 100 000) resulted in unrecognisable mess, basically whenever the face was even slightly tilted it would get really small and distorted and I wasn't sure if it was due to power settings or choosen architecture.

It could have been conversion too but I didn't check all of them just the ones that were recommended

It was one use only (with DF model) and I concluded that it must be something wrong with settings because dataset was good (no misaligned faces, blurry photos, etc) and H128 did a perfect job. Therefore in future I will only use SAE with LIAEF architecture which as you say is more stable.

The question is how and when to adjust those power settings and how many times can the model be reused before having to scrap it (I assume 100k is how much it should be pretrained and then I should always use that 100k iteration model for similar faces and only train on it 2-3 times (unless the same face is being used) and after 200-300k I should start again from that 100k for similar faces and train a new one for more different looking face?).

Also what kind of resolution is 176? I could have understand something in between 128-256 like 192 but 176? You sure its that resolution?

A small jump in resolution may mean a long time of extra training.

Yes ...a long time but more better resolution results.



Not necessarily. Increased resolution would only have a visible effect in conversion if the face in the destination video has enough pixel density so that the higher resolution model doesn't have to downscale. For 720p destination videos, which seem to be the most popular, if the height of the masked face is about 18% of the vertical height of the screen (as it would be for pretty much any full body or waist up shot), you would see literally no difference in definition. For a lower quality destination video like 480, the face would need to take up over a quarter of the screen before there'd technically be a difference, and it would probably need to be significantly larger before the naked eye would start noticing a difference.

Also there's limitations on how much the model can learn based on the quality of your SRC images. If the source faces are only crisp enough to produce good results at a 128 resolution level, increasing it further won't do anything.



Who does choose src and dst face resolution less than 720p anyways ? Most of the time people don't even realize they using down scaling options even tho they have given better quality facesets.

And there is not a big time difference either. If you can train your model for 48 hours there is no problem in training it a little further.
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
VirginBoI said:
I keep CA weights turned on throughout training. It simply makes your predicted results better by making it more stable.
CA weights doesn't morph I guess not pixel loss.
I used pixel loss at the last 10k iterations.

Pixel loss is also for stability and more details like CA weights.

Only Liaef Architecture and options like pretrain and face, background style morphs your face to dst.

Also increase your resolution according to your dst.

I take 176 as it's best setting for my dpfks as you can see.



Yes ...a long time but more better resolution results.

Well my PC is already taking around 2200 ms per iteration at 128 (batch size 8, optimizer 2/3 doesnt matter) so increasing resolution even further will make it go even slower (at this rate to get from 15k to 50k will take like 22 hours which is way to long). I will see how fast Colab does it and maybe train with it at higher resolution, like that 192 which to me looks better (number wise :p).

Too bad my internet sucks and it takes almost 2 hours to upload my datasets to google drive for colab...

I'm a bit scared of pixel loss and CA because of collapse, power styles too supossely can cause it... some say that CA weights don't do much, some say different, guess I'll have to test it.

Also you are saying that it only morphs with liaef and when using style powers but then isn't it that LIAEF is 100% "active" only when those styles at 0 and closer to 100 it actually uses SAE more which also morphs right?, pretraining isn't making a difference for it, is it? It's supposed to only improve lightning, etc by training on different faces? Man... that's really complicated...

As for resolution again I think it's best to actually measure the size of the face in dst file when it's closest to camera (takes most of the space) and see how big a box around it would be and then take off like 20% so not to waste like 256x256 of pixels when rest of the scene has the face far away where even 128 or 64 could resolve all the detail...
 

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
tutsmybarreh said:
VirginBoI said:
I keep CA weights turned on throughout training. It simply makes your predicted results better by making it more stable.
CA weights doesn't morph I guess not pixel loss.
I used pixel loss at the last 10k iterations.

Pixel loss is also for stability and more details like CA weights.

Only Liaef Architecture and options like pretrain and face, background style morphs your face to dst.

Also increase your resolution according to your dst.

I take 176 as it's best setting for my dpfks as you can see.



Yes ...a long time but more better resolution results.

Well my PC is already taking around 2200 ms per iteration at 128 (batch size 8, optimizer 2/3 doesnt matter) so increasing resolution even further will make it go even slower (at this rate to get from 15k to 50k will take like 22 hours which is way to long). I will see how fast Colab does it and maybe train with it at higher resolution, like that 192 which to me looks better (number wise :p).

Too bad my internet sucks and it takes almost 2 hours to upload my datasets to google drive for colab...

I'm a bit scared of pixel loss and CA because of collapse, power styles too supossely can cause it... some say that CA weights don't do much, some say different, guess I'll have to test it.

Also you are saying that it only morphs with liaef and when using style powers but then isn't it that LIAEF is 100% "active" only when those styles at 0 and closer to 100 it actually uses SAE more which also morphs right?, pretraining isn't making a difference for it, is it? It's supposed to only improve lightning, etc by training on different faces? Man... that's really complicated...

As for resolution again I think it's best to actually measure the size of the face in dst file when it's closest to camera (takes most of the space) and see how big a box around it would be and then take off like 20% so not to waste like 256x256 of pixels when rest of the scene has the face far away where even 128 or 64 could resolve all the detail...

The newest version now includes autobackup so you don't have to worry about model collapse as much.

LIAEF is a different model which will morph faces regardless. DF is more like a face swap, LIAEF is more like morphing data_dst to look like dats_src.

Pre-training improves lighting, but can make your final video look less like data_src since it's training with multiple faces.
 
Here are the settings I use. I have used the 128 model and SAE and I find SAE much better. I'm using Shadow PC with a P5000. If I use Google Cobalt I have to go batch size of 24 or if I am stuck with the K80 then it has to be 16. 


Write preview: false
Batch size: 32
sort yaw: true (as long as I have more source images than dst-which is 90% of the time)
random flip: false
res: 128
face type: full
learn mask: true
opti: 1
archi: df
ae dims 512
e ch dims 42
d ch dims 21
multiscale true
ca weights false
pixel loss false 
face style:0
bg style: 0
random ct: false
pre-train: false

For me I have found that even with pixel loss off, face and background style at 0, around 170,000k Iterations my models collapse and when I use the backup, it just collapses again. With the last June update, keeping pixel loss off and not touching the random ct, and face or bg style I don't have any problems with model collapse before the 150k mark.

I have found no difference in 176 resolution and 128 resolution for my videos that I have tested.  What helps for me is a higher batch size and with 128 resolution I can get a batch size of 32. I used to go with 176 resolution and 16 batch size but results were not as good. I also sharpen my SRC faceset with the high pass filter in photoshop and that makes a big difference in the quality.  Here are a couple of screenshots of a video I did to get an idea. 


Chantel Zales Screens




I have tried 128 model and SAE and I find SAE much better. It is a little slower but the quality makes up for it.  The only settings I change are turning on Multiscale Decoder and sort or feed by yaw everything else I use default settings. 

With the side angles, I have had success by ensuring I include plenty of the SRC pictures matching that angle and I will help the AI by training the different face pitches (angles) seperately. Anytime the angle changes it blurs and morphs the face into a glob for those frames but I just finished a Wendy Fiore video and it turned out pretty well by using this method.  Any bad scenes from the first convert gives me an idea on what to train.

If I reuse a model, its with a different DST and keep the same SRC. I tried keeping the DST the same and using a different SRC but results were not nearly as good.  I haven't learned how to make a good BJ vid yet so I can't help there. I have tried the masking and it works really well but it is tedious. 30 frames per second for a standard video equals a long time editing for a very short clip. Biggest thing for me is scene selection and finding the best angle. I'll use Premiere Pro or FFMPEG to customize my videos and I keep them pretty short. Usually between 2 to 5 minutes at the most.  I have made a couple decent facials with the maskimg but I have a great template in After Effects I use for that.


Hope this helps.
 
Top