MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We are looking for community members who are intested in helping out. See our HELP WANTED post.

Optimizer 1 and lower batch VS. Optimizer 2 and higher batch

Status
Not open for further replies.

sakajo

DF Admirer
Which is a better, using Optimizer 1 with a lower batch size, or optimizer 2 with a higher batch size? 

Still trying to grasp the idea of batch sizes. Is larger always better?
 

avalentino93

DF Admirer
sakajo said:
Which is a better, using Optimizer 1 with a lower batch size, or optimizer 2 with a higher batch size? 
Still trying to grasp the idea of batch sizes. Is larger always better?

Hard to answer.  Its not really a comparison.  Depends on your batch size.
Obviously lower optimizer is faster since no off loading to RAM.  However, if you have to lower your batch size from say 10 to 4, probably not worth it in the later stages of training.  Would be fine in the beginning stages.   

tl;dr - lower optimizer is technically faster, but not necessarily better.  If you dont have a large enough batch size, you'll never get your loss counts down.
 

sakajo

DF Admirer
avalentino93 said:
sakajo said:
Which is a better, using Optimizer 1 with a lower batch size, or optimizer 2 with a higher batch size? 
Still trying to grasp the idea of batch sizes. Is larger always better?

Hard to answer.  Its not really a comparison.  Depends on your batch size.
Obviously lower optimizer is faster since no off loading to RAM.  However, if you have to lower your batch size from say 10 to 4, probably not worth it in the later stages of training.  Would be fine in the beginning stages.   

tl;dr - lower optimizer is technically faster, but not necessarily better.  If you dont have a large enough batch size, you'll never get your loss counts down.

Im looking at Opt2=BS 10, and Opt 1=BS 7. 

I get the feeling that the preview looks worse after switching to Opt 1, but not entirely sure. The blue/yellow graphs seems to be longer (or wider) since I switched. What does that mean?
 
F

Fappenberg

Guest
Assuming

Opt1, BS 7 = 2000 ms then 2000/7 = 285.71 ms for 1 batch size
and
Opt2, BS 10 = 2500 ms then 2500/10 = 250 ms for 1 batch size

therefore Opt2 will be faster since more data is being fed into the network
 

sakajo

DF Admirer
Fappenberg said:
Assuming

Opt1, BS 7 = 2000 ms then 2000/7 = 285.71 ms for 1 batch size
and
Opt2, BS 10 = 2500 ms then 2500/10 = 250 ms for 1 batch size

therefore Opt2 will be faster since more data is being fed into the network

For me its roughly the same

Opt1, BS 7 = 1090 ms then 1090/7 = 155.7 ms for 1 batch size
and
Opt2, BS 10 =  ms then 1600/10 = 160 ms for 1 batch size

My issue is not with speed, its with quality. If im not mistaken the higher batch size means better final quality of the result?
 
F

Fappenberg

Guest
Generally yes, higher batch will give a more accurate result.
 

VladShev

DF Vagrant
Verified Video Creator
Learning with a big batch will give you a more accurate learning vector.

That is, if you set the maximum possible, the result will be better in the end. But also you will get a very long training time. Therefore, I agree with @"Fappenberg" , we need a middle ground between the size of the batch and the speed of training.

If speed is important to you - train with batch 4 or even lower.
If you need quality and you absolutely do not care how long it takes - set the maximum possible batch.
If you want to quickly and efficiently, use the formula provided by Fappenberg.
 

sakajo

DF Admirer
VladShev said:
Learning with a big batch will give you a more accurate learning vector.

That is, if you set the maximum possible, the result will be better in the end. But also you will get a very long training time. Therefore, I agree with @"Fappenberg" , we need a middle ground between the size of the batch and the speed of training.

If speed is important to you - train with batch 4 or even lower.
If you need quality and you absolutely do not care how long it takes - set the maximum possible batch.
If you want to quickly and efficiently, use the formula provided by Fappenberg.

And if im opting for quality i should train highest batch size possible from the start?
 

VladShev

DF Vagrant
Verified Video Creator
sakajo said:
VladShev said:
Learning with a big batch will give you a more accurate learning vector.

That is, if you set the maximum possible, the result will be better in the end. But also you will get a very long training time. Therefore, I agree with @"Fappenberg" , we need a middle ground between the size of the batch and the speed of training.

If speed is important to you - train with batch 4 or even lower.
If you need quality and you absolutely do not care how long it takes - set the maximum possible batch.
If you want to quickly and efficiently, use the formula provided by Fappenberg.

And if im opting for quality i should train highest batch size possible from the start?

Opinions differ, someone believes that it’s better to set the batch low at the beginning, and then increase it, someone who needs to put the maximum possible batch immediately and so on all the time.

My personal opinion is that if the goal is to achieve maximum quality, then you need to set the maximum batch at the beginning. A more accurate direction of study, therefore, a more accurate model. I myself did not work with machine learning, I only read articles on this topic, so that I may be wrong in this matter. Ideally, of course, hear the opinion of the real data scientist.
 

sakajo

DF Admirer
VladShev said:
sakajo said:
VladShev said:
Learning with a big batch will give you a more accurate learning vector.

That is, if you set the maximum possible, the result will be better in the end. But also you will get a very long training time. Therefore, I agree with @"Fappenberg" , we need a middle ground between the size of the batch and the speed of training.

If speed is important to you - train with batch 4 or even lower.
If you need quality and you absolutely do not care how long it takes - set the maximum possible batch.
If you want to quickly and efficiently, use the formula provided by Fappenberg.

And if im opting for quality i should train highest batch size possible from the start?

Opinions differ, someone believes that it’s better to set the batch low at the beginning, and then increase it, someone who needs to put the maximum possible batch immediately and so on all the time.

My personal opinion is that if the goal is to achieve maximum quality, then you need to set the maximum batch at the beginning. A more accurate direction of study, therefore, a more accurate model. I myself did not work with machine learning, I only read articles on this topic, so that I may be wrong in this matter. Ideally, of course, hear the opinion of the real data scientist.

Awesome. Lazy people like me are lucky to have people like you relay the info :D
 

avalentino93

DF Admirer
My personal opinion is that if the goal is to achieve maximum quality, then you need to set the maximum batch at the beginning. A more accurate direction of study, therefore, a more accurate model. I myself did not work with machine learning, I only read articles on this topic, so that I may be wrong in this matter. Ideally, of course, hear the opinion of the real data scientist.

Frankly, I totally disagree with this.  There is no reason to set a large batch size in the beginning.  All you are comparing then is really shitty, undefined data.  With a lower batch size in the beginning, the speed of your iterations are far more important than the comparison of multiple batches.   
I typically set my batch size to 4 (192 Res, Default Dimms, 1 Opt, No mask learning).  Once I get to around 1.0 to .8 loss rate (or the preview is starting to look like an actual detailed face), then I move to batch size 8 (same settings).  Once Im down to constantly in the .8's then I move batch size to 10 (192 Res, Default Dimms, 2 Opt, Mask Learning).  Finally, if I care enough about getting more detail, Ill opt for batch size of 12 or 14 at 3 opt.  However, then you're realllllly waiting around.  But I often do the final part on weekends where its going to run for 48-60 hours straight.
 

sakajo

DF Admirer
avalentino93 said:
My personal opinion is that if the goal is to achieve maximum quality, then you need to set the maximum batch at the beginning. A more accurate direction of study, therefore, a more accurate model. I myself did not work with machine learning, I only read articles on this topic, so that I may be wrong in this matter. Ideally, of course, hear the opinion of the real data scientist.

Frankly, I totally disagree with this.  There is no reason to set a large batch size in the beginning.  All you are comparing then is really shitty, undefined data.  With a lower batch size in the beginning, the speed of your iterations are far more important than the comparison of multiple batches.   
I typically set my batch size to 4 (192 Res, Default Dimms, 1 Opt, No mask learning).  Once I get to around 1.0 to .8 loss rate (or the preview is starting to look like an actual detailed face), then I move to batch size 8 (same settings).  Once Im down to constantly in the .8's then I move batch size to 10 (192 Res, Default Dimms, 2 Opt, Mask Learning).  Finally, if I care enough about getting more detail, Ill opt for batch size of 12 or 14 at 3 opt.  However, then you're realllllly waiting around.  But I often do the final part on weekends where its going to run for 48-60 hours straight.

Do you employ random warp/trueface/learning dropout at any point in training? if so at which point?
 

avalentino93

DF Admirer
Do you employ random warp/trueface/learning dropout at any point in training? if so at which point?


Random warp only on a new model (not new dst).   I run random warp for probably 100k iterations.
Then I turn it off and turn on lr_dropout from 100k until I get to around .4 to .5 loss rate or after I start seeing eyelashes.
After that I turn on trueface.

One thing Ive noticed if you turn on lr_drop out AND trueface too early, your loss rates will never drop.  I had that issue, then disable trueface and my loss rate started dropping quite a bit again.
 

sakajo

DF Admirer
avalentino93 said:
Do you employ random warp/trueface/learning dropout at any point in training? if so at which point?


Random warp only on a new model (not new dst).   I run random warp for probably 100k iterations.
Then I turn it off and turn on lr_dropout from 100k until I get to around .4 to .5 loss rate or after I start seeing eyelashes.
After that I turn on trueface.

One thing Ive noticed if you turn on lr_drop out AND trueface too early, your loss rates will never drop.  I had that issue, then disable trueface and my loss rate started dropping quite a bit again.

I think im experiencing that right now. will try turning true face off. thanks for the info
 

aymanalz

DF Pleb
avalentino93 said:
Random warp only on a new model (not new dst).   I run random warp for probably 100k iterations.
Then I turn it off and turn on lr_dropout from 100k until I get to around .4 to .5 loss rate or after I start seeing eyelashes.
After that I turn on trueface.

One thing Ive noticed if you turn on lr_drop out AND trueface too early, your loss rates will never drop.  I had that issue, then disable trueface and my loss rate started dropping quite a bit again.

What about color transfer? At what point do you turn it on, if at all? And which one, and for how long?

And how long or how many iterations do you run trueface for?

I think there should be a general thread where everybody posts their deepfaking practices, and their experiences. @"dpfks" @"tutsmybarreh"
 

Patafix

DF Vagrant
Hey guys,

What's about Face/background Style Power ?What's the difference with random warp ? Are you using them in the previous trainning stage and disable them when details start to appear ?
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
aymanalz said:
What about color transfer? At what point do you turn it on, if at all? And which one, and for how long?

And how long or how many iterations do you run trueface for?

I think there should be a general thread where everybody posts their deepfaking practices, and their experiences. @"dpfks" @"tutsmybarreh"

Well users are encouraged to discuss and share their model settings, techniques/workflows when they post their fakes in the celebrity fakes (NSFW/SFW) section:
https://mrdeepfakes.com/forums/forum-celebrity-deepfakes
https://mrdeepfakes.com/forums/forum-sfw-deepfake-videos
But that's only when actually posting some deepfake. For discussion regarding workflows we could technically use this one: https://mrdeepfakes.com/forums/thread-saehd-thread
It's for SAEHD specifically but it's close to SAE and usually these massive threads turn into a mess of spam, repetitive questions and so on. Same will probably happen with the new general issues thread that replaced the mess/spam that was under main DFL guide: https://mrdeepfakes.com/forums/thread-dfl-general-issues-thread

As for your question about color transfer, true face, etc - at the end, never from beginning, you start with random warp on to generalize face faster (thats why you can also start with lower batch size but if you're using pretrained model there isn't much point becuase it already knows how face should look). After faces are generalized but not quite sharp you can increase batch size to max you can run, then once faces are fairly sharp you turn off random warp and train more, then you can (but don't have to) enable lr_dropout, keep training, at the end you can turn rct color transfer (but I recommend making sure your dataset is fairly close with colors to dst, color transfers can easily ruin the end result but sometimes they can actually save it) and true face.
 

sakajo

DF Admirer
tutsmybarreh said:
aymanalz said:
What about color transfer? At what point do you turn it on, if at all? And which one, and for how long?

And how long or how many iterations do you run trueface for?

I think there should be a general thread where everybody posts their deepfaking practices, and their experiences. @"dpfks" @"tutsmybarreh"

Well users are encouraged to discuss and share their model settings, techniques/workflows when they post their fakes in the celebrity fakes (NSFW/SFW) section:
https://mrdeepfakes.com/forums/forum-celebrity-deepfakes
https://mrdeepfakes.com/forums/forum-sfw-deepfake-videos
But that's only when actually posting some deepfake. For discussion regarding workflows we could technically use this one: https://mrdeepfakes.com/forums/thread-saehd-thread
It's for SAEHD specifically but it's close to SAE and usually these massive threads turn into a mess of spam, repetitive questions and so on. Same will probably happen with the new general issues thread that replaced the mess/spam that was under main DFL guide: https://mrdeepfakes.com/forums/thread-dfl-general-issues-thread

As for your question about color transfer, true face, etc - at the end, never from beginning, you start with random warp on to generalize face faster (thats why you can also start with lower batch size but if you're using pretrained model there isn't much point becuase it already knows how face should look). After faces are generalized but not quite sharp you can increase batch size to max you can run, then once faces are fairly sharp you turn off random warp and train more, then you can (but don't have to) enable lr_dropout, keep training, at the end you can turn rct color transfer (but I recommend making sure your dataset is fairly close with colors to dst, color transfers can easily ruin the end result but sometimes they can actually save it) and true face.

Ya I just had a result ruined by rct training. I thought it was trueface that caused the results to have less detail than a back up, but I guess it was the rct
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
I think the initial question was already answered so I'm closing this thread.
 
Status
Not open for further replies.
Top