Mr DeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
   
low training performance plz help
#1
Hi guys,

so i spent 2 days now and i am not that familiar with this stuff, so i don't know what else to try.

I have a Nvidia Quadro P5000 graphics Card with 16gb vram which should be perfect for deepfakes..but i am barely getting 1k epoch per hour...it is really super slow. As i read some People get 10k per hour with an GTX 1070, based on the Cards, i'd assume the P5000 should perform better. It has the same amount of cuda cores as an GTX 1080.

Drivers are updated, using the prebuild DeepFacelab.

Quote:===== Model summary =====
== Model name: MIAEF128
==
== Current epoch: 846
==
== Options:
== |== batch_size : 48
== |== multi_gpu : False
== |== created_vram_gb : 16
== Running on:
== |== [0 : Quadro P5000]
=========================
Starting. Press "Enter" to stop training and save model.
Training [#000881][2800ms] loss_src:0.067 loss_dst:0.067

Might the batch size be an issue?
The above quoted result took 1,5hrs so far!

I appreciate any help!
#2
How many epoch do you get with h128? Right now it's taking 2800ms per epoch. What's the # for the h128 training?
#3
With H128 ~2200ms per epoch
#4
EDIT
Use this link, made a mistake.
You are not allowed to view links. Register or Login to view.

Go to DFLMainDirectory\_internal\bin\DeepFaceLab\models\Model_H128

Create a copy of the original Model.py file for backup.

Paste the file that I uploaded and try to train and tell me what do you get.

Obviously sustitute the DFLMainDirectory for your directory.
#5
Error: cannot import name 'nnlib'
Traceback (most recent call last):
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\mainscripts\Trainer.py", line 36, in trainerThread
model = models.import_model(model_name)(
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\__init__.py", line 12, in import_model
module = __import__('Model_'+name, globals(), locals(), [], 1)
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\__init__.py", line 1, in <module>
from .Model import Model
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\Model.py", line 3, in <module>
from nnlib import nnlib
ImportError: cannot import name 'nnlib'

getting errors.
#6
(01-04-2019, 04:06 PM)halo2k Wrote: You are not allowed to view links. Register or Login to view.Error: cannot import name 'nnlib'
Traceback (most recent call last):
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\mainscripts\Trainer.py", line 36, in trainerThread
model = models.import_model(model_name)(
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\__init__.py", line 12, in import_model
module = __import__('Model_'+name, globals(), locals(), [], 1)
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\__init__.py", line 1, in <module>
from .Model import Model
File "C:\Users\*\Downloads\DF\DeepFaceLabTorrent\_internal\bin\DeepFaceLab\models\Model_H128\Model.py", line 3, in <module>
from nnlib import nnlib
ImportError: cannot import name 'nnlib'

getting errors.

You're not using the latest version then. You'll have to send me your original file and I'll modify it and send it back to you. Use transfer.sh to upload.
#7
The new Version from yesterday asks for several new Options, is this ok what i selected?

Running trainer.

Loading model...

Model first run. Enter model options as default for each run.
Write preview history? (y/n skip:n) : n
Target epoch (skip:unlimited) : 100000
Batch_size (skip:model choice) : 48
Feed faces to network sorted by yaw? (y/n skip:n) :
Loading: 100%|█████████████████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 1672.77it/s]
Loading: 100%|███████████████████████████████████████████████████████████████████| 1466/1466 [00:00<00:00, 1768.45it/s]
===== Model summary =====
== Model name: H128
==
== Current epoch: 0
==
== Model options:
== |== target_epoch : 100000
== |== batch_size : 48
== Session options:
== |== target_epoch : 100000
== |== batch_size : 48
== Running on:
== |== [0 : Quadro P5000]
=========================
Saving...
Starting. Target epoch: 100000. Press "Enter" to stop training and save model.


This is with your model.py and the above Settings:

Training [#000029][1962ms] loss_src:0.297 loss_dst:0.1703
#8
(01-04-2019, 04:12 PM)halo2k Wrote: You are not allowed to view links. Register or Login to view.The new Version from yesterday asks for several new Options, is this ok what i selected?

Running trainer.

Loading model...

Model first run. Enter model options as default for each run.
Write preview history? (y/n skip:n) : n
Target epoch (skip:unlimited) : 100000
Batch_size (skip:model choice) : 48
Feed faces to network sorted by yaw? (y/n skip:n) :
Loading: 100%|█████████████████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 1672.77it/s]
Loading: 100%|███████████████████████████████████████████████████████████████████| 1466/1466 [00:00<00:00, 1768.45it/s]
===== Model summary =====
== Model name: H128
==
== Current epoch: 0
==
== Model options:
== |== target_epoch : 100000
== |== batch_size : 48
== Session options:
== |== target_epoch : 100000
== |== batch_size : 48
== Running on:
== |== [0 : Quadro P5000]
=========================
Saving...
Starting. Target epoch: 100000. Press "Enter" to stop training and save model.


This is with your model.py and the above Settings:

Training [#000029][1962ms] loss_src:0.297 loss_dst:0.1703

Ah didn't know that. I'm using an older version. Go back to the original model.py and change batch size to something small such as 4.
#9
in the older Version i am not being asked what batch size i want. how do i change it here?

just played with the batch size a bit, These are the results i get with the new deepfacelab Version from yesterday:

H128
batch size 60 = 2477 ms (your file)
batch size 48 = 1962 ms (your file)
batch size 20 = 1070 ms (your file)
batch size 10 = 665 ms (your file)
batch size 5 = 470 ms (your file)
batch size 4 = 428 ms (your file)
batch size 4 = 426 ms (original file)

Should i Keep using this version or go back to the old? If i should go back to the older version, where do i change the Batch size?
#10
Ah then I see what's going on. Use your original Model.py file with the version that you're currently using. Forget about my file.

From my understanding, the only thing that is happening is that the batch size determines how many "comparisons" are being made per each epoch.

Since you have a graphics card with a lot of vRAM, you can run a very large batch size such as 48 or even 60. This means that for every epoch, it is making 60 "comparisons" at the same time. Obviously, this takes more time than let's set 4 comparisons at the same time as it would be with batch size = 4.

This means that even though you'll get less epoch #s for your training, you'll still be getting the desired results since each epoch is doing more work at the same time. As long as your loss values get better, and your preview shows that the model is improving, then the epoch # shouldn't matter.

Forum Jump:

Users browsing this thread: 1 Guest(s)