MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We are looking for community members who are intested in helping out. See our HELP WANTED post.

low training performance plz help

halo2k

DF Pleb
Verified Video Creator
Hi guys,

so i spent 2 days now and i am not that familiar with this stuff, so i don't know what else to try.

I have a Nvidia Quadro P5000 graphics Card with 16gb vram which should be perfect for deepfakes..but i am barely getting 1k epoch per hour...it is really super slow. As i read some People get 10k per hour with an GTX 1070, based on the Cards, i'd assume the P5000 should perform better. It has the same amount of cuda cores as an GTX 1080.

Drivers are updated, using the prebuild DeepFacelab.

===== Model summary =====
== Model name: MIAEF128
==
== Current epoch: 846
==
== Options:
== |== batch_size : 48
== |== multi_gpu : False
== |== created_vram_gb : 16
== Running on:
== |== [0 : Quadro P5000]
=========================
Starting. Press "Enter" to stop training and save model.
Training [#000881][2800ms] loss_src:0.067 loss_dst:0.067

Might the batch size be an issue?
The above quoted result took 1,5hrs so far!

I appreciate any help!
 

easy40

DF Admirer
Verified Video Creator
How many epoch do you get with h128? Right now it's taking 2800ms per epoch. What's the # for the h128 training?
 

easy40

DF Admirer
Verified Video Creator
EDIT
Use this link, made a mistake.
https://transfer.sh/(/If2Dz/Model.py).zip

Go to DFLMainDirectory\_internal\bin\DeepFaceLab\models\Model_H128

Create a copy of the original Model.py file for backup.

Paste the file that I uploaded and try to train and tell me what do you get.

Obviously sustitute the DFLMainDirectory for your directory.
 

halo2k

DF Pleb
Verified Video Creator
Error: cannot import name 'nnlib'
Traceback (most recent call last):
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\mainscripts\Trainer.py", line 36, in trainerThread
model = models.import_model(model_name)(
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\__init__.py", line 12, in import_model
module = __import__('Model_'+name, globals(), locals(), [], 1)
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\__init__.py", line 1, in <module>
from .Model import Model
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\Model.py", line 3, in <module>
from nnlib import nnlib
ImportError: cannot import name 'nnlib'

getting errors.
 

easy40

DF Admirer
Verified Video Creator
halo2k said:
Error: cannot import name 'nnlib'
Traceback (most recent call last):
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\mainscripts\Trainer.py", line 36, in trainerThread
model = models.import_model(model_name)(
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\__init__.py", line 12, in import_model
module = __import__('Model_'+name, globals(), locals(), [], 1)
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\__init__.py", line 1, in <module>
from .Model import Model
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\Model.py", line 3, in <module>
from nnlib import nnlib
ImportError: cannot import name 'nnlib'

getting errors.

You're not using the latest version then. You'll have to send me your original file and I'll modify it and send it back to you. Use transfer.sh to upload.
 

halo2k

DF Pleb
Verified Video Creator
The new Version from yesterday asks for several new Options, is this ok what i selected?

Running trainer.

Loading model...

Model first run. Enter model options as default for each run.
Write preview history? (y/n skip:n) : n
Target epoch (skip:unlimited) : 100000
Batch_size (skip:model choice) : 48
Feed faces to network sorted by yaw? (y/n skip:n) :
Loading: 100%|█████████████████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 1672.77it/s]
Loading: 100%|███████████████████████████████████████████████████████████████████| 1466/1466 [00:00<00:00, 1768.45it/s]
===== Model summary =====
== Model name: H128
==
== Current epoch: 0
==
== Model options:
== |== target_epoch : 100000
== |== batch_size : 48
== Session options:
== |== target_epoch : 100000
== |== batch_size : 48
== Running on:
== |== [0 : Quadro P5000]
=========================
Saving...
Starting. Target epoch: 100000. Press "Enter" to stop training and save model.


This is with your model.py and the above Settings:

Training [#000029][1962ms] loss_src:0.297 loss_dst:0.1703
 

easy40

DF Admirer
Verified Video Creator
halo2k said:
The new Version from yesterday asks for several new Options, is this ok what i selected?

Running trainer.

Loading model...

Model first run. Enter model options as default for each run.
Write preview history? (y/n skip:n) : n
Target epoch (skip:unlimited) : 100000
Batch_size (skip:model choice) : 48
Feed faces to network sorted by yaw? (y/n skip:n) :
Loading: 100%|█████████████████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 1672.77it/s]
Loading: 100%|███████████████████████████████████████████████████████████████████| 1466/1466 [00:00<00:00, 1768.45it/s]
===== Model summary =====
== Model name: H128
==
== Current epoch: 0
==
== Model options:
== |== target_epoch : 100000
== |== batch_size : 48
== Session options:
== |== target_epoch : 100000
== |== batch_size : 48
== Running on:
== |== [0 : Quadro P5000]
=========================
Saving...
Starting. Target epoch: 100000. Press "Enter" to stop training and save model.


This is with your model.py and the above Settings:

Training [#000029][1962ms] loss_src:0.297 loss_dst:0.1703

Ah didn't know that. I'm using an older version. Go back to the original model.py and change batch size to something small such as 4.
 

halo2k

DF Pleb
Verified Video Creator
in the older Version i am not being asked what batch size i want. how do i change it here?

just played with the batch size a bit, These are the results i get with the new deepfacelab Version from yesterday:

H128
batch size 60 = 2477 ms (your file)
batch size 48 = 1962 ms (your file)
batch size 20 = 1070 ms (your file)
batch size 10 = 665 ms (your file)
batch size 5 = 470 ms (your file)
batch size 4 = 428 ms (your file)
batch size 4 = 426 ms (original file)

Should i Keep using this version or go back to the old? If i should go back to the older version, where do i change the Batch size?
 

easy40

DF Admirer
Verified Video Creator
Ah then I see what's going on. Use your original Model.py file with the version that you're currently using. Forget about my file.

From my understanding, the only thing that is happening is that the batch size determines how many "comparisons" are being made per each epoch.

Since you have a graphics card with a lot of vRAM, you can run a very large batch size such as 48 or even 60. This means that for every epoch, it is making 60 "comparisons" at the same time. Obviously, this takes more time than let's set 4 comparisons at the same time as it would be with batch size = 4.

This means that even though you'll get less epoch #s for your training, you'll still be getting the desired results since each epoch is doing more work at the same time. As long as your loss values get better, and your preview shows that the model is improving, then the epoch # shouldn't matter.
 

halo2k

DF Pleb
Verified Video Creator
Thank you really much so far. Helped a lot in understanding this better.

So if after let's say 10 hours my loss didn't change a lot, this means my faceset is too bad/missing good angles, right?

In my last try from yesterday till today morning, the last entries looked like this:

loss_src: 0.057 loss_dst 0.027
loss_src: 0.059 loss_dst 0.034
loss_src: 0.060 loss_dst 0.029
loss_src: 0.057 loss_dst 0.029

that means no further progress cause faceset is too bad?!
 

easy40

DF Admirer
Verified Video Creator
halo2k said:
Thank you really much so far. Helped a lot in understanding this better.

So if after let's say 10 hours my loss didn't change a lot, this means my faceset is too bad/missing good angles, right?

In my last try from yesterday till today morning, the last entries looked like this:

loss_src: 0.057 loss_dst 0.027
loss_src: 0.059 loss_dst 0.034
loss_src: 0.060 loss_dst 0.029
loss_src: 0.057 loss_dst 0.029

that means no further progress cause faceset is too bad?!

You could still get some better #s with more training, but the return of investment is lower. You'll have to spend a lot more time to get maybe a little bit better results.

If the results are not good enough, then yes, it may be that the source images don't have all the information that the model needs.

At the end of the day, remember that the model is still not perfect. You won't get to a loss of 0.001. Check the preview, see how everything is looking, then use the convert debug .bat file to see a few frames with the new face and see how everything is looking. Don't focus so much on the epoch or loss #s.

Also, don't focus excessively on hardware and speed. I only have 3gb of vRAM and can get some good results with 10 hours of training. Could I do more with a Quadro or a Titan x? I don't know. Maybe, maybe not. Results depend more on the quality of the source faceset and the scene that you're choosing.
 

halo2k

DF Pleb
Verified Video Creator
Okay thank you.

With your model.py i was saving like 800-900ms per epoch, why should i still go back to the old one? Isn't this a massive improvment? Or did you lower the Quality somehow to Speed it up?
 

easy40

DF Admirer
Verified Video Creator
halo2k said:
Okay thank you.

With your model.py i was saving like 800-900ms per epoch, why should i still go back to the old one? Isn't this a massive improvment? Or did you lower the Quality somehow to Speed it up?

Mine was setting a lower batch size manually, so you're not really saving anything, it's just that it is using a lower batch size than what you're seeing in the display. So yeah, just use the original file and focus more on quality of faceset and check that the scene that you're choosing doesn't have a ton of weird angles because the model won't be able to get the job done.
 

halo2k

DF Pleb
Verified Video Creator
No no. In the newer DFL version i could manually enter the batchsize and got 800-900ms less, than with the standard model.py and 48 batch size.
 

easy40

DF Admirer
Verified Video Creator
Yes, but as I've said, I modified a value that determines the batch size regardless of what you input in the cmd window.
 
Top