TalosOfCrete said:
-You don't need learn_mask on the entire time. It is generally recommended to only have it on for a couple thousand iterations, as it learns the mask fast and consumes a considerable chunk of VRAM.
-In general, I never recommend 256x256. The sweet spot is usually 160x160 or 192x192. For example, Ctrl+Shift+Face trains at 160x160. Cranking it up to 256x256 is extremely computationally intensive, and is forcing you to use Optimizer 3 and BS 4 (256x256 is only really beneficial in high resolution, very close up shots). For your hardware (and in general) you should definitely turn it down.
-Last tips:
-Use clipgrad. It can save your model's ass quite often. A collapsed model will mean you'll have to start over from your most recent backup. Not fun. The performance penalty is quite small anyway.
-On your hardware, you may be able to add a couple to a few (~3) more e_ch/d_ch dims (encoder/decoder dimensions per channel) as well as a ~32 jump in AE dims if you drop your resolution. This will make smaller details like eye movements and teeth easier and faster to capture (IF YOU BUMP THEM, MAKE SURE TO INCREASE BOTH - simply increasing one or the other may waste computational resources, with the model failing to either deliver the information from one end to the other (AE dims) or failing to capture the detail (e_ch/d_ch dims).
Nice to see people reading some of my stuff (unless you knew all of that, then good job anyway on just not being another n00b ;P ).
Exactly, learn mask can be enabled for a while (I'd say it's best to turn it once the faces are well trained). Learn mask is surprisingly heavy on vram and overall speed. I recommend just using FAN-DST, it gets the job done 99% of the time.
256 is fine for closeups and really only that, if you're doing SFW fakes it might be worth it but only if you have a really good dataset (super sharp, consistent in the lighting), otherwise all flaws of your dataset will show up even more, especially if the person has facial hair which changes look constantly and at higher res model can have difficulty properly training on like individual strands/clumps of hair, etc).
256 I'd recommend also for stuff like 4K porn where you just need some more detail but all of that can be faked with upscaling. Built in sharpening along with upscaling (RankSRGAN) can really make the faces look high res. Anything above 128 if fine for 60% of scenes, then 160/176/192 will help with some closeups, etc or just to make small detail learning more effective (stuff like freckles, beauty marks, etc).
Remember 256x256 is 4 times the resolution of 128x128, thus in perfectly "scalable" world it would mean 4 times more data/vram usage/4 times slower/4 times smaller batch size, etc.
Still don't know why clipgrad isn't on by default and not toggable, perfromance hit is was a margin of error, I measured 50 ms increase in iteration time, didn't check vram but it didn't cause OOM on maxed out batch so it's probably not much.
I'd probably focus more on resolution tho than on increasing ae/e/d_ch dims, especially since at lower resolution the change might not be noticeable and higher resolution would definitely be noticeable,