MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We are looking for community members who are intested in helping out. See our HELP WANTED post.

GPU is not used when training and CUDA memory errors.

I'm using the latest DFL build 03_02_2020 with RTX 2070 Super, and I'm getting constant CUDA memory errors, and in Task Manager Performance tab I can clearly see that only CPU is used for the process. Is there a fix for this? I'm using default settings for training model.

GVTxbQTh.png

3aIMtJph.png


I looked at google docs file with settings from other users with the same GPUs, I tried decreasing batch size to 6, then to 4, I get the same error regardless.
 

Groggy4

NotSure
Verified Video Creator
I could be wrong, probably is. But I feel there is a pattern every time someone have this problem where it still trains, but get's CUDA errors and failed to locate memory. And that is the CPU is too outdated and underpowered compared to a newer GPU it's supposed to run. It just don't have the requirements to handle such a high end GPU with completely different sets of architecture.
https://pc-builds.com/calculator/Core_i5-2500/GeForce_RTX_2070_SUPER/0cX14olu/16/
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
DFL only requires CPU to support SSE, not sure if AVX is required anymore. I can see GPU is being used but you need to select gpu in task manager and then you will see 5 graphs, one is for memory and other 4 can be set to display usage of various parts of gpu, select CUDA, if there is usage, then it's working correctly.
 
tutsmybarreh said:
DFL only requires CPU to support SSE, not sure if AVX is required anymore. I can see GPU is being used but you need to select gpu in task manager and then you will see 5 graphs, one is for memory and other 4 can be set to display usage of various parts of gpu, select CUDA, if there is usage, then it's working correctly.

GPU tab in Task Manager looks like this.
Original.
COA9mj0h.png

Selected Cuda instead of Copy.
GsI7bQhh.png
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
Change any of the 4 graphs to show CUDA usage. I can see your memory is loaded with something so it must be using your GPU.
If you have models_opt_on_gpu: False then it's using your GPU + CPU.


It works fine. Run lower res model or decrease batch size. Try disabling models_opt_on_gpu (set to false), it may run more stable but slower.

learn mask is demanding so without it vram usage is decreased.
 
tutsmybarreh said:
Change any of the 4 graphs to show CUDA usage. I can see your memory is loaded with something so it must be using your GPU.
If you have models_opt_on_gpu: False then it's using your GPU + CPU.


It works fine. Run lower res model or decrease batch size. Try disabling models_opt_on_gpu (set to false), it may run more stable but slower.

learn mask is demanding so without it vram usage is decreased.



I don't think that I fully comprehend what this means. And the error is somewhat misleading, as well as continuation of the process. So I'm getting a Cuda error, like GPU is not supposed to be taking part in the process, and there is almost no load on GPU, compared to CPU, but somehow Cuda is supposedly working. So what, does it work at the peak performance, or I will get the slowest times? Decreasing batch size to 6 and 4 does not seem to change anything.

What am I missing? Is it just slower than it has to be, or is it less effective will yiled poor results? Will upgrading processor and motherboard actually change anything?
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
There is load, look at the graphs for CUDA! You literally posted screenshot showing 80% usage. But sure, your cpu is quite weak for that GPU, you should have at least 6 core i7 for it... or buy an 8 core AMD (Ryzen 3 which is cheaper and equally good) and you'll be good to go, GPU is fine, check if you don't have any stuff running in the background and using up gpu, use models opt on gpu: False.
 
tutsmybarreh said:
There is load, look at the graphs for CUDA! You literally posted screenshot showing 80% usage. But sure, your cpu is quite weak for that GPU, you should have at least 6 core i7 for it... or buy an 8 core AMD (Ryzen 3 which is cheaper and equally good) and you'll be good to go, GPU is fine, check if you don't have any stuff running in the background and using up gpu, use models opt on gpu: False.

I have done as you had suggested, and turned off GPU model optimization. The error is almost gone, there is only line regarding Cuda memory error (previously it was somewhat 12 lines). My PC is much, much more silent now, seeing how Cuda is almost not used anymore. I'm guessing it was GPU fan that was making so much noise. There is a little bit more GPU load now, and some VRAM is used, and CPU is used almost 100%. 

7qcQhQ6h.png


It seems like the process is a little bit slower now, though I'm not entirely sure. I'm trying to understand if I gained or lost anything by moving load from Cuda to CPU.
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
That's how it works, when the feature is enabled model and network optimization is handled by GPU alone, that means higher VRAM usage but faster training, when it's disabled (False) then optimizer is run on CPU and thus VRAM load is lower, CUDA usage is lower, you can run higher batch size but CPU usage is higher, RAM usage is higher and training is slower because GPU needs to communicate more with CPU and RAM (which is slower than when it's all handled by GPU and it's VRAM). Hope this explains how this option works (and I hope I explained it right too) :)
 
tutsmybarreh said:
That's how it works, when the feature is enabled model and network optimization is handled by GPU alone, that means higher VRAM usage but faster training, when it's disabled (False) then optimizer is run on CPU and thus VRAM load is lower, CUDA usage is lower, you can run higher batch size but CPU usage is higher, RAM usage is higher and training is slower because GPU needs to communicate more with CPU and RAM (which is slower than when it's all handled by GPU and it's VRAM). Hope this explains how this option works (and I hope I explained it right too) :)

Thank you for taking time to explain this option to me, it's getting clearer every day. However I still don't comprehend the pros and cons completely.

With this option off I put more load on CPU, increase iteration time, but can increase batch size. With the option on I get more load on GPU, I can still set the same batch size (albeit with CUDA memory errors, but it looks like it does not affect anything), but the iteration time decreases. So where's the caveat? Which is better, more efficient and will produce better results in the end?

Sorry to not let go of this, but I really do want to get the hang of it and understand deeper :)
 

TMBDF

Moderator | Deepfake Creator | Guide maintainer
Staff member
Moderator
Verified Video Creator
True - smaller batch faster iteration, False higher batch (not by much 1-2 batch more max) longer iteration, useful only on low end GPUs (less than 8GB of VRAM). No difference in quality but higher batch is usually better (but difference between 8 and 10 is non existent). It boils down to whether increased iteration time with higher batch gets you better result faster. You take iteration and divide by batch so - 1000 ms at batch 8 (say with option enabled) 1000/8 125 ms per 1 batch, with it on False and load on GPU CPU - batch 10 but iteration time is now 1500 ms for example - 1500/10 = 150 ms so it's worse as you will process less data in per unit of batch time (becuase iteration time is higher so it takes more time). But if you get batch 12 at 1500 ms - 1500/12 = 125 ms - same training speed but higher model accuracy (theoretical) so it's worth to set it to False and run, obviously higher batch size - less time for 1 iteration per 1 unit of batch so even better.

If I helped you consider donating (BTC or Patreon).
 
Groggy4 said:
I could be wrong, probably is. But I feel there is a pattern every time someone have this problem where it still trains, but get's CUDA errors and failed to locate memory. And that is the CPU is too outdated and underpowered compared to a newer GPU it's supposed to run. It just don't have the requirements to handle such a high end GPU with completely different sets of architecture.
https://pc-builds.com/calculator/Core_i5-2500/GeForce_RTX_2070_SUPER/0cX14olu/16/

Well, I actually upgraded my PC to Ryzen 5 3600, there is no bottleneck now, but the error remained of course. 

But I think I more or less comprehend it now: each batch takes up approximately 2 Gbs of VRAM (2.08 or 2.16 depending on various model options), so I physically can not go higher than 4 batch size with GPU oprimization and 8 Gbs of VRAM on RTX 2070, because it will always overflow. The only solution is as @tutsmybarreh helped me realize is to either really decrease model options and proceed with good iteration speed or turn off the GPU optimization alltogether and proceed with super low iteration speeds.
 
Top