MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We are looking for community members who are intested in helping out. See our HELP WANTED post.

Getting a SAE model started well and other clarifications

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
Question 1: Side angles/profile shots are bad

There can be a number of reasons why side angles/profile shots are of poor quality.
  1. Training model - So far the best models that deal with these types of shots are DF and LIAEF (possible VG? - I haven't been able to test this)
  2. Face extraction/alignment - The extraction process is highly important. Currently DLIB and MT extraction process both struggle with profile shots. This is why MANUAL extract for certain scenes are so important. Making sure the landmarks on your facesets are spot on = quality results.
  3. Limited training data - Yes if you don't have enough high quality side shots to train, it'll come out bad.
Question 2: Reusing models
  • Yes, I generally recommend reusing models but only swapping out one of the models (celebrity or pornstar, not both)
  • Yes, you can use a model for other actresses, but see the point above. This is recommended because if you swab both data_src and data_dst the end result of the NEW trained model may not look like the NEW celebrity. Yes it will save time to reuse a model, but if you absolutely want to swap both data_src and data_dst out, I would use a NEW celebrity with similar facial structures/features of the one in your old model (that you're reusing). Also, always back up your model in case you don't like the results.
  • Yes, this is what most people do, but you will still need to retrain the model with the new scene for a few hours (until results look decent). This is because the first scene you did will have different angles from the new scene you want to use.
Question 3: What does erode do

This feature "erodes" the masked area. Basically if you add erosion, the area of the mask (face area) that is swapping gets smaller. Similarly adding negative value will expand that area. This used to be used more with the older models when you convert, but less used now, especially in SAE. It doesn't do anything to color.

Question 4/?Section 4: Learn mask

This feature makes training longer, and is more resource intensive but it saves you time when converting. Before this feature, for each scene we would have to play around with erosion and blur and tinker around with it until we can make it as seamless as possible. With this feature, it made our lives a lot easier. Imaging taking your scene, converting a small clip of it literally 10x or more with playing with erosion + blur combinations until there's no longer a box around the face...

Of course you can turn it off and it'll probably reduce training time, but you'll likely have to add erosion and blur yourself.

Resolution: 64, 128, 256 (and you can scale up by increments of 16)

Everything is testing based on 128, and I would recommend sticking to that. Theoretically up-close shots, higher resolution videos will benefit from 256... but likely if you don't have a good enough GPU to train a higher dims, the higher resolution won't make a difference to the naked eye. Cannot comment on how much VRAM is enough, but I know 1080ti can train it, but again not sure on dims used.

Half-face vs Full-face: Good for eating sausages?

No, that's not what its not, and that's not true either. So traditionally the half-face = H128 model and full-face = DF model. Basically half face swaps a smaller area vs full face.

Vte6dfth.jpg


Left = DF (full face), Right = H128 (half face)

Now in the example image above its harder to see since there is erosion and blur, but if you look closely the DF model on the left fills more face, including cheek bones and more of the forehead. In most scenes this looks more like data_src but DF is bad with obstructions... H128 on the right keeps the original data_dst structure more, and sometimes it cuts off eyebrows causing the "double eyebrow" effect, but I think H128 handles obstructions better.

Batch size: What should you use?

Nope, its all trial and error. But hey, we have the same GPU, I use 21. Stable running for weeks. The proper way to trial and error is set you dims you want first, then raise your batch size... More on that later in the tutorial I guess.

TIP: Starting a new model? - start with a low batch number first for quick training, then switch to your max (21 in our case). I usually do batch of 8 when training to epock 25k+ and then raise it to 21 and turn on pixel loss.

Feed faces sorted by yaw: use or not?

If you have more images in data_src faceset compared to data_dst facesets, yes use this (or else I think you get an error). No it's not better, most of the time you should say no to this.

Auto encoder dims and encode/decoder dims per channel: What is the best for my GPU?

Yah I don't know. Trial and error. The higher these values are, the higher quality your deepfake is. Think of this as... How much detail your deepfake will have. For example: want those freckles? Raise these dims (I don't know what value will actually obtain this). Things to keep in mind, raising these values will significantly increase the time of training to reach a desired outcome. Also, you should probably keep the same proportions.

Pixel loss: What's this do?

Correct, it is recommended to be turned on only after 25k epochs. It will bring out more detail like separating teeth instead of a big blur. Also it's said to fix/reduce jitter in the faceswap. It may fix skin tones? - not totally sure about that one.

Al3MHZDh.png
enOgiowh.png


For both I think pixel loss is on the right... Minor details I guess

Face style power and BG style power: When to use and what does it do?

I don't know where the recommendation of turning this on after 10k came from so I can't comment on that. I don't know if this is exactly right, but this is how I think of it. The higher the value of each, the more the model will try to morph features of data_dst into the final product. For example:

lower the face style power = more like data_src
lower bg style power = the area around the masked face will look more like data+src

For me I train with 10/10 for a bit, then decrease to 0.1/whatever I feel
 
dpfks said:
Question 1: Side angles/profile shots are bad

There can be a number of reasons why side angles/profile shots are of poor quality.
  1. Training model - So far the best models that deal with these types of shots are DF and LIAEF (possible VG? - I haven't been able to test this)
  2. Face extraction/alignment - The extraction process is highly important. Currently DLIB and MT extraction process both struggle with profile shots. This is why MANUAL extract for certain scenes are so important. Making sure the landmarks on your facesets are spot on = quality results.
  3. Limited training data - Yes if you don't have enough high quality side shots to train, it'll come out bad.
Question 2: Reusing models
  • Yes, I generally recommend reusing models but only swapping out one of the models (celebrity or pornstar, not both)
  • Yes, you can use a model for other actresses, but see the point above. This is recommended because if you swab both data_src and data_dst the end result of the NEW trained model may not look like the NEW celebrity. Yes it will save time to reuse a model, but if you absolutely want to swap both data_src and data_dst out, I would use a NEW celebrity with similar facial structures/features of the one in your old model (that you're reusing). Also, always back up your model in case you don't like the results.
  • Yes, this is what most people do, but you will still need to retrain the model with the new scene for a few hours (until results look decent). This is because the first scene you did will have different angles from the new scene you want to use.
Question 3: What does erode do

This feature "erodes" the masked area. Basically if you add erosion, the area of the mask (face area) that is swapping gets smaller. Similarly adding negative value will expand that area. This used to be used more with the older models when you convert, but less used now, especially in SAE. It doesn't do anything to color.

Question 4/?Section 4: Learn mask

This feature makes training longer, and is more resource intensive but it saves you time when converting. Before this feature, for each scene we would have to play around with erosion and blur and tinker around with it until we can make it as seamless as possible. With this feature, it made our lives a lot easier. Imaging taking your scene, converting a small clip of it literally 10x or more with playing with erosion + blur combinations until there's no longer a box around the face...

Of course you can turn it off and it'll probably reduce training time, but you'll likely have to add erosion and blur yourself.

Resolution: 64, 128, 256 (and you can scale up by increments of 16)

Everything is testing based on 128, and I would recommend sticking to that. Theoretically up-close shots, higher resolution videos will benefit from 256... but likely if you don't have a good enough GPU to train a higher dims, the higher resolution won't make a difference to the naked eye. Cannot comment on how much VRAM is enough, but I know 1080ti can train it, but again not sure on dims used.

Half-face vs Full-face: Good for eating sausages?

No, that's not what its not, and that's not true either. So traditionally the half-face = H128 model and full-face = DF model. Basically half face swaps a smaller area vs full face.

Vte6dfth.jpg


Left = DF (full face), Right = H128 (half face)

Now in the example image above its harder to see since there is erosion and blur, but if you look closely the DF model on the left fills more face, including cheek bones and more of the forehead. In most scenes this looks more like data_src but DF is bad with obstructions... H128 on the right keeps the original data_dst structure more, and sometimes it cuts off eyebrows causing the "double eyebrow" effect, but I think H128 handles obstructions better.

Batch size: What should you use?

Nope, its all trial and error. But hey, we have the same GPU, I use 21. Stable running for weeks. The proper way to trial and error is set you dims you want first, then raise your batch size... More on that later in the tutorial I guess.

TIP: Starting a new model? - start with a low batch number first for quick training, then switch to your max (21 in our case). I usually do batch of 8 when training to epock 25k+ and then raise it to 21 and turn on pixel loss.

Feed faces sorted by yaw: use or not?

If you have more images in data_src faceset compared to data_dst facesets, yes use this (or else I think you get an error). No it's not better, most of the time you should say no to this.

Auto encoder dims and encode/decoder dims per channel: What is the best for my GPU?

Yah I don't know. Trial and error. The higher these values are, the higher quality your deepfake is. Think of this as... How much detail your deepfake will have. For example: want those freckles? Raise these dims (I don't know what value will actually obtain this). Things to keep in mind, raising these values will significantly increase the time of training to reach a desired outcome. Also, you should probably keep the same proportions.

Pixel loss: What's this do?

Correct, it is recommended to be turned on only after 25k epochs. It will bring out more detail like separating teeth instead of a big blur. Also it's said to fix/reduce jitter in the faceswap. It may fix skin tones? - not totally sure about that one.

Al3MHZDh.png
enOgiowh.png


For both I think pixel loss is on the right... Minor details I guess

Face style power and BG style power: When to use and what does it do?

I don't know where the recommendation of turning this on after 10k came from so I can't comment on that. I don't know if this is exactly right, but this is how I think of it. The higher the value of each, the more the model will try to morph features of data_dst into the final product. For example:

lower the face style power = more like data_src
lower bg style power = the area around the masked face will look more like data+src

For me I train with 10/10 for a bit, then decrease to 0.1/whatever I feel

If nobody took the time to tell you that you rock today, well, you sir totally rock. Thanks for the in-depth reply and time!
 

apb1m

DF Vagrant
Thanks for this post. These were some questions i had as well and i found this all to be useful.
 

Venatos

DF Vagrant
thank you so much for this awesome post. many of these questions i had myself.
im around 250k epochs on my first sae model with pretty much all default settings and im struggling to get below 0.1 for the last 100k or so.... is it even possible? or have i reached a plateau?

also im a little baffled by the high batchnumbers i see, i run OOM with 10 and i have 16gb of vram....
 

SPT

Moderator
Staff member
Moderator
Verified Video Creator
Hello, I'm posting in this thread because I have the same graphic card, I hope you won't mind.

Few questions :

1\Considering my GPU (1070), which version of DFL should I use ? Latest DeepFaceLabCUDA10.1AVX, .1SSE ? Other ?

2\Batch size : I use 21, most of time I get a message saying something like "not enough memory but it will work anyway" Should I reduce progressively ?

3\Want to try to use the new feature dealing with face obestruction, is there a specific tutorial for it ? What are the settings ?
As I understand you train with SAE, and say yes to create mask option, but then ? What are the other options best for this ? Is it all you need to do with training and the real "face obstrution" masking occurs in merging phase ? If so what settings are necessary ? Can I do it with .bat files already included in DFL or do I have to create a specific .bat file to use the face obstruction tool ?

Edit : I downloaded Fanseg Faceset and latest CUDA10.1SSE. After reading the Fanseg readme, I'm not completely sure about what to do : I have a SAE 128 model trained to 60k epoch on CUDA9.0SSE version of DFL. Goal was to train to 80-100k epoch. When it's done, what should I do with the fanseg faceset ? Do I add the pictures from fanseg into _dst or _src folder ? There is different sets for glasses, hands obstruction, and porn, which is the one I need : do I just copy fanseg porn src in my model's src folder and go on training the model ? Can't be that simple. Please give more details or some relevant forum links explaining more. Also, in Fanseg readme, it says "inimum required VRAM to train FANSEG is 6GB with batch size 24" not sure if I can even do this with a 1070 ? If not I guess it's possible to use Google collab ?
 

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
SPT said:
Hello, I'm posting in this thread because I have the same graphic card, I hope you won't mind.

Few questions :

1\Considering my GPU (1070), which version of DFL should I use ? Latest DeepFaceLabCUDA10.1AVX, .1SSE ? Other ?

2\Batch size : I use 21, most of time I get a message saying something like "not enough memory but it will work anyway" Should I reduce progressively ?

3\Want to try to use the new feature dealing with face obestruction, is there a specific tutorial for it ? What are the settings ?
As I understand you train with SAE, and say yes to create mask option, but then ? What are the other options best for this ? Is it all you need to do with training and the real "face obstrution" masking occurs in merging phase ? If so what settings are necessary ? Can I do it with .bat files already included in DFL or do I have to create a specific .bat file to use the face obstruction tool ?

Edit : I downloaded Fanseg Faceset and latest CUDA10.1SSE. After reading the Fanseg readme, I'm not completely sure about what to do : I have a SAE 128 model trained to 60k epoch on CUDA9.0SSE version of DFL. Goal was to train to 80-100k epoch. When it's done, what should I do with the fanseg faceset ? Do I add the pictures from fanseg into _dst or _src folder ? There is different sets for glasses, hands obstruction, and porn, which is the one I need : do I just copy fanseg porn src in my model's src folder and go on training the model ? Can't be that simple. Please give more details or some relevant forum links explaining more. Also, in Fanseg readme, it says "inimum required VRAM to train FANSEG is 6GB with batch size 24" not sure if I can even do this with a 1070 ? If not I guess it's possible to use Google collab ?

1) I use CUDA 9.2 version
2) on my 1070 I use batch 11, optimizer 2 (for SAE). No way in hell you'll get batch of 21.\
3) don't need to do anything different with obstructions. When converting choose any FAN-X option.
 

SPT

Moderator
Staff member
Moderator
Verified Video Creator
dpfks said:
SPT said:
Hello, I'm posting in this thread because I have the same graphic card, I hope you won't mind.

Few questions :

1\Considering my GPU (1070), which version of DFL should I use ? Latest DeepFaceLabCUDA10.1AVX, .1SSE ? Other ?

2\Batch size : I use 21, most of time I get a message saying something like "not enough memory but it will work anyway" Should I reduce progressively ?

3\Want to try to use the new feature dealing with face obestruction, is there a specific tutorial for it ? What are the settings ?
As I understand you train with SAE, and say yes to create mask option, but then ? What are the other options best for this ? Is it all you need to do with training and the real "face obstrution" masking occurs in merging phase ? If so what settings are necessary ? Can I do it with .bat files already included in DFL or do I have to create a specific .bat file to use the face obstruction tool ?

Edit : I downloaded Fanseg Faceset and latest CUDA10.1SSE. After reading the Fanseg readme, I'm not completely sure about what to do : I have a SAE 128 model trained to 60k epoch on CUDA9.0SSE version of DFL. Goal was to train to 80-100k epoch. When it's done, what should I do with the fanseg faceset ? Do I add the pictures from fanseg into _dst or _src folder ? There is different sets for glasses, hands obstruction, and porn, which is the one I need : do I just copy fanseg porn src in my model's src folder and go on training the model ? Can't be that simple. Please give more details or some relevant forum links explaining more. Also, in Fanseg readme, it says "inimum required VRAM to train FANSEG is 6GB with batch size 24" not sure if I can even do this with a 1070 ? If not I guess it's possible to use Google collab ?

1) I use CUDA 9.2 version
2) on my 1070 I use batch 11, optimizer 2 (for SAE). No way in hell you'll get batch of 21.\
3) don't need to do anything different with obstructions. When converting choose any FAN-X option.
Thanks for your answer.

1)I did it
2)I'll try it
3)ok

PS : Will retrain with what I know, previous model gets an error "you are trying to load a weight file containing 4 layers into a model with 16 layers". As i said probably just noob settings in my models or maybe just normal incompatibility with 9.1 and 9.2 ? Would be interested to know what does this mean if someone has an idea.
 

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
SPT said:
dpfks said:
SPT said:
Hello, I'm posting in this thread because I have the same graphic card, I hope you won't mind.

Few questions :

1\Considering my GPU (1070), which version of DFL should I use ? Latest DeepFaceLabCUDA10.1AVX, .1SSE ? Other ?

2\Batch size : I use 21, most of time I get a message saying something like "not enough memory but it will work anyway" Should I reduce progressively ?

3\Want to try to use the new feature dealing with face obestruction, is there a specific tutorial for it ? What are the settings ?
As I understand you train with SAE, and say yes to create mask option, but then ? What are the other options best for this ? Is it all you need to do with training and the real "face obstrution" masking occurs in merging phase ? If so what settings are necessary ? Can I do it with .bat files already included in DFL or do I have to create a specific .bat file to use the face obstruction tool ?

Edit : I downloaded Fanseg Faceset and latest CUDA10.1SSE. After reading the Fanseg readme, I'm not completely sure about what to do : I have a SAE 128 model trained to 60k epoch on CUDA9.0SSE version of DFL. Goal was to train to 80-100k epoch. When it's done, what should I do with the fanseg faceset ? Do I add the pictures from fanseg into _dst or _src folder ? There is different sets for glasses, hands obstruction, and porn, which is the one I need : do I just copy fanseg porn src in my model's src folder and go on training the model ? Can't be that simple. Please give more details or some relevant forum links explaining more. Also, in Fanseg readme, it says "inimum required VRAM to train FANSEG is 6GB with batch size 24" not sure if I can even do this with a 1070 ? If not I guess it's possible to use Google collab ?

1) I use CUDA 9.2 version
2) on my 1070 I use batch 11, optimizer 2 (for SAE). No way in hell you'll get batch of 21.\
3) don't need to do anything different with obstructions. When converting choose any FAN-X option.
Thanks for your answer.

1)I did it
2)I'll try it
3)ok

PS : Will retrain with what I know, previous model gets an error "you are trying to load a weight file containing 4 layers into a model with 16 layers". As i said probably just noob settings in my models or maybe just normal incompatibility with 9.1 and 9.2 ? Would be interested to know what does this mean if someone has an idea.

That means the model you are trying to use is incompatible with the version of DFL you are trying to use.

The developer has made changes to DFL models over time, and older models may not work. You have to retrain them. Of course all the new developments are to improve the accuracy and quality of model.
 

SPT

Moderator
Staff member
Moderator
Verified Video Creator
dpfks said:
SPT said:
dpfks said:
SPT said:
Hello, I'm posting in this thread because I have the same graphic card, I hope you won't mind.

Few questions :

1\Considering my GPU (1070), which version of DFL should I use ? Latest DeepFaceLabCUDA10.1AVX, .1SSE ? Other ?

2\Batch size : I use 21, most of time I get a message saying something like "not enough memory but it will work anyway" Should I reduce progressively ?

3\Want to try to use the new feature dealing with face obestruction, is there a specific tutorial for it ? What are the settings ?
As I understand you train with SAE, and say yes to create mask option, but then ? What are the other options best for this ? Is it all you need to do with training and the real "face obstrution" masking occurs in merging phase ? If so what settings are necessary ? Can I do it with .bat files already included in DFL or do I have to create a specific .bat file to use the face obstruction tool ?

Edit : I downloaded Fanseg Faceset and latest CUDA10.1SSE. After reading the Fanseg readme, I'm not completely sure about what to do : I have a SAE 128 model trained to 60k epoch on CUDA9.0SSE version of DFL. Goal was to train to 80-100k epoch. When it's done, what should I do with the fanseg faceset ? Do I add the pictures from fanseg into _dst or _src folder ? There is different sets for glasses, hands obstruction, and porn, which is the one I need : do I just copy fanseg porn src in my model's src folder and go on training the model ? Can't be that simple. Please give more details or some relevant forum links explaining more. Also, in Fanseg readme, it says "inimum required VRAM to train FANSEG is 6GB with batch size 24" not sure if I can even do this with a 1070 ? If not I guess it's possible to use Google collab ?

1) I use CUDA 9.2 version
2) on my 1070 I use batch 11, optimizer 2 (for SAE). No way in hell you'll get batch of 21.\
3) don't need to do anything different with obstructions. When converting choose any FAN-X option.
Thanks for your answer.

1)I did it
2)I'll try it
3)ok

PS : Will retrain with what I know, previous model gets an error "you are trying to load a weight file containing 4 layers into a model with 16 layers". As i said probably just noob settings in my models or maybe just normal incompatibility with 9.1 and 9.2 ? Would be interested to know what does this mean if someone has an idea.

That means the model you are trying to use is incompatible with the version of DFL you are trying to use.

The developer has made changes to DFL models over time, and older models may not work. You have to retrain them. Of course all the new developments are to improve the accuracy and quality of model.

I retrained from the start with 9.2 version, cleaned both face extracts as I did with previous version. Tried to train in SAE with different settings : yours, some with batch 10, or 12. Some with 256 and 192 resolution, always had memory errors. I just retried with 128 res and it works. Is it impossible to use 192 or 256 res with a 1070 ? Or can I play around with other settings to make it work ?

Also, I used SF3D instead of MT for extacting faces, it seems to get less false positives ? Am I right supposing it's better than MT ?


Settings used and more questions :

batch : 11
sort by yaw : False
random flip : false
resolution : 128 (I'd like more)
face type : full
optimizer mode : 1 (I left it to default, is there really an improvmeent with 2 ? (Used 2 when I had memory errors so I tried 1 this time, even if it seems to be a problem of resolution and not optimizer)
archi : df (I left default, but no idea what it does, I'd like some explanation if you can, otherwise I will re read the tutorial)
ae dims : 512
e ch dims : 42
d ch dims : 21  (all default, don't know either what this does)
multiscale decoder : False - default (no idea)
ca weights : False - default (no idea)
pixel loss : no (read that it's better to not use it and if you must at least wait till 30k epoch)
face style : 0 (I understand more or less what it does)
bg style power : 0 (I think I understand)
apply random ct : False (no idea)


Also if you can link or explain what does pretrain do and what's your opinion about it ? Always usefull ? or not ? if so in which situation should we use it, if not all the time ?
 

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
@"SPT"

I don't think a 1070 can go that high. If you want to try just decrease batch size and use optimizer 2 or 3. Just trial and error.

I can't comment much on pretrain. I don't use it as I think it makes the data_src less accurate. The developer implemented it as he states it improves lighting conditions.
 

SPT

Moderator
Staff member
Moderator
Verified Video Creator
dpfks said:
@"SPT"

I don't think a 1070 can go that high. If you want to try just decrease batch size and use optimizer 2 or 3. Just trial and error.

I can't comment much on pretrain. I don't use it as I think it makes the data_src less accurate. The developer implemented it as he states it improves lighting conditions.

Thanks, just made my first test with Fan-DST (epoch only 40k, planning the final one to be 80k-100k) it works fine, but generates a lot of contour lines (left most settings to default, didn't change mask blur and related things, as it was just a test), can you advise some conversion settings to have nice blurred contours with FAN-dst ?

Also I tried super resolution option, I didn't notice much improvement. What do you think about it ?

Lastly : is Overlay mode mandatory to use FAN-DST or is it unrelated ?
 
Top