Mr DeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
   
avalentino93Questions about new optimizer
#1
Optimizer now exists and Ive been messing around with it trying to figure it out.  I dont understand if it actually helps speed up the process, or its meant to do more, but slow the process down.

Specs:
1080SC - 8GB
8790k
32GB RAM

Current tests:
SAE
Batch - 4
Resolution - 256
Dims - Standard
Light Encoder - No
Multi - Yes

Optimizer Settings:
1 ~ Fails with OOM
2 ~ 11-14GB with 20% CPU and 50%GPU - around 1500ms
3 ~ 17-20GB of RAM with about 80% CPU and 50%GPU

So to me it seems like the optimizer isnt there to help speed things up in the sense that many of us think of the word "optimize".   But it appears to be for those of us that want to push the limits and be able to do 256 resolution on a 8GB card.  Since if I choose no optimizer with the above settings, it fails.

Anyone else understanding this thing yet?
#2
new optimizer brings deepfakes to new era.
it allows to train bigger network on same VRAM.

For example 256res with not reduced dimensions 512-42 was impossible to train on 6GB before.
#3
(03-15-2019, 05:57 PM)iperov Wrote: You are not allowed to view links. Register or Login to view.new optimizer brings deepfakes to new era.
it allows to train bigger network on same VRAM.

For example 256res with not reduced dimensions 512-42 was impossible to train on 6GB before.

Right.  I get that part.  But my question is at what cost?   
For example, if I have a setup that can run 256 resolution, with higher dimms and 16-24 batch sizes, without the optimizer, should I use that instead?  Or would the optimizer make it faster?  Is the optimizer only there to assist with lack of specs?  

If you used all the same settings and your hardware isnt an issue, which is fastest?  1, 2 or 3 ?
#4
User @titan_rw did a quick test, and came out with these results:

All done at BS=16. Test model was 160 res 640x48 dims, multiscale=y

mode1 = 1258ms/iter using 3gb ram, small cpu usage
mode2 = 1400ms/iter using 12gb ram, small cpu usage
mode3 = 1840ms/iter using 12gb ram, 80% cpu! usage

So from this small test, we can conclude that if you're happy with your current settings, mode 1 is the fastest. Of course this should be tested on the exact setup you have to confirm. So in a sense I guess "using optimizer only if you're limited by your hardware" is correct.
#5
(03-18-2019, 05:56 PM)dpfks Wrote: You are not allowed to view links. Register or Login to view.So in a sense I guess "using optimizer only if you're limited by your hardware" is correct.

I am not limited by hardware with my 6GB.
But I cannot train 256 res with not reduced dims and batch size.
If you want to spend more time to achieve better resolution - optimizer modes is a solution.
#6
This is what I would call a memory optimizer.  Every case I've tested is slower with higher modes.  The advantage is that you can run a more complicated model on the same hardware.

I did some more testing at 256 res, and default NN size.

My 12 gig Titan can run native 256 res (mode 1):

mode 1 - bs6 - 1100ms/iter


I tried mode 2 on the Titan, but it only gained me bs 7 or something.  Not really worth it.  Mode 3 would definitely let me up the bs, but it would be so much slower I don't think the higher bs is worth the extra time per iteration.

For this card, at 256 res, and default NN size, I don't need the optimization modes.  Theoretically they'd let me run a bigger NN at the same bs.  I think it's still unknown if this is needed.


Where you need the optimization modes is lesser memory cards.  A comparison with my 980ti (6 gigs):

mode 1 - oom (wont even start on bs 1)
mode 2 - bs2 - 1400ms
mode 2 - bs4 - 1700ms
mode 2 - bs6 - 2200ms (oom) (ran 10 iters them oom'd)
mode 3 - bs6 - 2500ms
mode 3 - bs8 - 2800ms
mode 3 - bs10 - 3100ms
mode 3 - bs12 - oom (won't start)

Here is my 6 gig card actually managing to do the same work as my 12 gig card, but it needs mode 3, which slows iterations down from 1100ms of the Titan to 2500ms.  But it can do it, and the final quality will be the same.  It'll just take longer.
#7
Quote:mode 3 - bs6 - 2500ms
mode 3 - bs8 - 2800ms
mode 3 - bs10 - 3100ms


bs6 - 416ms per sample
bs10 - 310ms per sample

bs10 actually faster, because more samples per second will be feeded to the network.
#8
(04-28-2019, 11:20 PM)iperov Wrote: You are not allowed to view links. Register or Login to view.
Quote:mode 3 - bs6 - 2500ms
mode 3 - bs8 - 2800ms
mode 3 - bs10 - 3100ms


bs6 - 416ms per sample
bs10 - 310ms per sample

bs10 actually faster, because more samples per second will be feeded to the network.

That was the case with my 980ti, which I rarely train on.  It's mostly used for face extraction, and conversion.

I ran some more tests on my Titan.  This is a 192 res model, mask on, multiscale on.  SAE-DF, default dims.


mode 1 bs 6 @ 1060ms = 176ms/sample
mode 1 bs 7 @ 1190ms = 170ms/sample
mode 1 bs 8 (oom)
mode 2 bs 8 @ 1750ms = 218ms/sample
mode 2 bs 10 @ 1920ms = 192ms/sample
mode 2 bs 12 @ 2150ms = 179ms/sample
mode 2 bs 14 @ 2300ms = 164ms/sample
mode 2 bs 16 @ 2450ms = 153ms/sample (oom'd)
mode 3 bs 18 @ 2980ms = 165ms/sample
mode 3 bs 20 (oom)

What's the best here?  1/7 - 2/14 and 3/18 are all very similar ms/sample.

What I was doing was running the first 50k or so on mode 1, then switching up to a higher mode and higher BS later.
#9
2/14 .
#10
hi everybody !
I have a question about optimiser mode, actually i ran mode 3.
But my question is, i have problems with 256 Resolution face.
How did you do guyz to run the trainer (for me, it's SAE in DF mode) working ?
Usually i use 128.

My spec are :
I7 7700K
GTX 1070
32Go Ram DDR4 3200Mhz.

Thanks for the help !

Forum Jump:

Users browsing this thread: 1 Guest(s)