sakajo said:Which is a better, using Optimizer 1 with a lower batch size, or optimizer 2 with a higher batch size?
Still trying to grasp the idea of batch sizes. Is larger always better?
avalentino93 said:sakajo said:Which is a better, using Optimizer 1 with a lower batch size, or optimizer 2 with a higher batch size?
Still trying to grasp the idea of batch sizes. Is larger always better?
Hard to answer. Its not really a comparison. Depends on your batch size.
Obviously lower optimizer is faster since no off loading to RAM. However, if you have to lower your batch size from say 10 to 4, probably not worth it in the later stages of training. Would be fine in the beginning stages.
tl;dr - lower optimizer is technically faster, but not necessarily better. If you dont have a large enough batch size, you'll never get your loss counts down.
Fappenberg said:Assuming
Opt1, BS 7 = 2000 ms then 2000/7 = 285.71 ms for 1 batch size
and
Opt2, BS 10 = 2500 ms then 2500/10 = 250 ms for 1 batch size
therefore Opt2 will be faster since more data is being fed into the network
VladShev said:Learning with a big batch will give you a more accurate learning vector.
That is, if you set the maximum possible, the result will be better in the end. But also you will get a very long training time. Therefore, I agree with @"Fappenberg" , we need a middle ground between the size of the batch and the speed of training.
If speed is important to you - train with batch 4 or even lower.
If you need quality and you absolutely do not care how long it takes - set the maximum possible batch.
If you want to quickly and efficiently, use the formula provided by Fappenberg.
sakajo said:VladShev said:Learning with a big batch will give you a more accurate learning vector.
That is, if you set the maximum possible, the result will be better in the end. But also you will get a very long training time. Therefore, I agree with @"Fappenberg" , we need a middle ground between the size of the batch and the speed of training.
If speed is important to you - train with batch 4 or even lower.
If you need quality and you absolutely do not care how long it takes - set the maximum possible batch.
If you want to quickly and efficiently, use the formula provided by Fappenberg.
And if im opting for quality i should train highest batch size possible from the start?
VladShev said:sakajo said:VladShev said:Learning with a big batch will give you a more accurate learning vector.
That is, if you set the maximum possible, the result will be better in the end. But also you will get a very long training time. Therefore, I agree with @"Fappenberg" , we need a middle ground between the size of the batch and the speed of training.
If speed is important to you - train with batch 4 or even lower.
If you need quality and you absolutely do not care how long it takes - set the maximum possible batch.
If you want to quickly and efficiently, use the formula provided by Fappenberg.
And if im opting for quality i should train highest batch size possible from the start?
Opinions differ, someone believes that it’s better to set the batch low at the beginning, and then increase it, someone who needs to put the maximum possible batch immediately and so on all the time.
My personal opinion is that if the goal is to achieve maximum quality, then you need to set the maximum batch at the beginning. A more accurate direction of study, therefore, a more accurate model. I myself did not work with machine learning, I only read articles on this topic, so that I may be wrong in this matter. Ideally, of course, hear the opinion of the real data scientist.
My personal opinion is that if the goal is to achieve maximum quality, then you need to set the maximum batch at the beginning. A more accurate direction of study, therefore, a more accurate model. I myself did not work with machine learning, I only read articles on this topic, so that I may be wrong in this matter. Ideally, of course, hear the opinion of the real data scientist.
avalentino93 said:My personal opinion is that if the goal is to achieve maximum quality, then you need to set the maximum batch at the beginning. A more accurate direction of study, therefore, a more accurate model. I myself did not work with machine learning, I only read articles on this topic, so that I may be wrong in this matter. Ideally, of course, hear the opinion of the real data scientist.
Frankly, I totally disagree with this. There is no reason to set a large batch size in the beginning. All you are comparing then is really shitty, undefined data. With a lower batch size in the beginning, the speed of your iterations are far more important than the comparison of multiple batches.
I typically set my batch size to 4 (192 Res, Default Dimms, 1 Opt, No mask learning). Once I get to around 1.0 to .8 loss rate (or the preview is starting to look like an actual detailed face), then I move to batch size 8 (same settings). Once Im down to constantly in the .8's then I move batch size to 10 (192 Res, Default Dimms, 2 Opt, Mask Learning). Finally, if I care enough about getting more detail, Ill opt for batch size of 12 or 14 at 3 opt. However, then you're realllllly waiting around. But I often do the final part on weekends where its going to run for 48-60 hours straight.
Do you employ random warp/trueface/learning dropout at any point in training? if so at which point?
avalentino93 said:Do you employ random warp/trueface/learning dropout at any point in training? if so at which point?
Random warp only on a new model (not new dst). I run random warp for probably 100k iterations.
Then I turn it off and turn on lr_dropout from 100k until I get to around .4 to .5 loss rate or after I start seeing eyelashes.
After that I turn on trueface.
One thing Ive noticed if you turn on lr_drop out AND trueface too early, your loss rates will never drop. I had that issue, then disable trueface and my loss rate started dropping quite a bit again.
avalentino93 said:Random warp only on a new model (not new dst). I run random warp for probably 100k iterations.
Then I turn it off and turn on lr_dropout from 100k until I get to around .4 to .5 loss rate or after I start seeing eyelashes.
After that I turn on trueface.
One thing Ive noticed if you turn on lr_drop out AND trueface too early, your loss rates will never drop. I had that issue, then disable trueface and my loss rate started dropping quite a bit again.
aymanalz said:What about color transfer? At what point do you turn it on, if at all? And which one, and for how long?
And how long or how many iterations do you run trueface for?
I think there should be a general thread where everybody posts their deepfaking practices, and their experiences. @"dpfks" @"tutsmybarreh"
tutsmybarreh said:aymanalz said:What about color transfer? At what point do you turn it on, if at all? And which one, and for how long?
And how long or how many iterations do you run trueface for?
I think there should be a general thread where everybody posts their deepfaking practices, and their experiences. @"dpfks" @"tutsmybarreh"
Well users are encouraged to discuss and share their model settings, techniques/workflows when they post their fakes in the celebrity fakes (NSFW/SFW) section:
https://mrdeepfakes.com/forums/forum-celebrity-deepfakes
https://mrdeepfakes.com/forums/forum-sfw-deepfake-videos
But that's only when actually posting some deepfake. For discussion regarding workflows we could technically use this one: https://mrdeepfakes.com/forums/thread-saehd-thread
It's for SAEHD specifically but it's close to SAE and usually these massive threads turn into a mess of spam, repetitive questions and so on. Same will probably happen with the new general issues thread that replaced the mess/spam that was under main DFL guide: https://mrdeepfakes.com/forums/thread-dfl-general-issues-thread
As for your question about color transfer, true face, etc - at the end, never from beginning, you start with random warp on to generalize face faster (thats why you can also start with lower batch size but if you're using pretrained model there isn't much point becuase it already knows how face should look). After faces are generalized but not quite sharp you can increase batch size to max you can run, then once faces are fairly sharp you turn off random warp and train more, then you can (but don't have to) enable lr_dropout, keep training, at the end you can turn rct color transfer (but I recommend making sure your dataset is fairly close with colors to dst, color transfers can easily ruin the end result but sometimes they can actually save it) and true face.