Mr DeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
   
tarquaDFL CUDA error on GTX 1050 TI?
#1
I've searched everywhere but haven't found any clarity on this error so far... I'm trying to run DFL on a new PC: 
  • Intel® Zeon® CPU E3-1225 V2 @ 3.20GHz
  • Windows 10 64-bit x64 processor
  • 16 GB DDR3 RAM
  • NVIDIA GeForce GTX 1050 Ti (4 GB dedicated), freshly installed drivers
I know the 1050 isn't ideal, but it's what I have to work with for now and is still better than running it on my CPU. Currently I can run DeepFaceLabOpenCLSSE_build_04_07_2019 but if I try to train using DeepFaceLabCUDA9.2SSE_build_04_07_2019, I get the following error every time, no matter what settings/batch size I try. Any assistance is much appreciated. I'm also welcome to any advice on the best settings for my system being new to DFL. I'm starting with a faceset of 3,534 in src and 2,654 in dst for training. 

Quote:Running trainer.

Loading model...

Model first run. Enter model options as default for each run.
Write preview history? (y/n ?:help skip:n) : n
Target iteration (skip:unlimited/default) :
0
Batch_size (?:help skip:0) : 100
Feed faces to network sorted by yaw? (y/n ?:help skip:n) : n
Flip faces randomly? (y/n ?:help skip:y) :
y
Src face scale modifier % ( -30...30, ?:help skip:0) :
0
Resolution ( 64-256 ?:help skip:128) :
128
Half or Full face? (h/f, ?:help skip:f) :
f
Learn mask? (y/n, ?:help skip:y) :
y
Optimizer mode? ( 1,2,3 ?:help skip:1) :
1
AE architecture (df, liae ?:help skip:df) :
df
AutoEncoder dims (32-1024 ?:help skip:512) :
512
Encoder dims per channel (21-85 ?:help skip:42) :
42
Decoder dims per channel (10-85 ?:help skip:21) :
21
Remove gray border? (y/n, ?:help skip:n) :
n
Use multiscale decoder? (y/n, ?:help skip:n) :
n
Use pixel loss? (y/n, ?:help skip: n ) :
n
Face style power ( 0.0 .. 100.0 ?:help skip:0.00) :
0.0
Background style power ( 0.0 .. 100.0 ?:help skip:0.00) :
0.0
Using TensorFlow backend.
Loading: 100%|####################################################################| 3534/3534 [00:17<00:00, 202.30it/s]
Loading: 100%|####################################################################| 2654/2654 [00:05<00:00, 483.74it/s]
===== Model summary =====
== Model name: SAE
==
== Current iteration: 0
==
== Model options:
== |== batch_size : 100
== |== sort_by_yaw : False
== |== random_flip : True
== |== resolution : 128
== |== face_type : f
== |== learn_mask : True
== |== optimizer_mode : 1
== |== archi : df
== |== ae_dims : 512
== |== e_ch_dims : 42
== |== d_ch_dims : 21
== |== remove_gray_border : False
== |== multiscale_decoder : False
== |== pixel_loss : False
== |== face_style_power : 0.0
== |== bg_style_power : 0.0
== Running on:
== |== [0 : GeForce GTX 1050 Ti]
=========================
Starting. Press "Enter" to stop training and save model.
Error: OOM when allocating tensor with shape[504] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node mul_137}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Adam/beta_2/read, Variable_101/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[{{node Mean_2/_1091}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6240_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last):
  File "C:\DeepFaceLabCUDA9.2SSE\_internal\DeepFaceLab\mainscripts\Trainer.py", line 93, in trainerThread
    iter, iter_time = model.train_one_iter()
  File "C:\DeepFaceLabCUDA9.2SSE\_internal\DeepFaceLab\models\ModelBase.py", line 362, in train_one_iter
    losses = self.onTrainOneIter(sample, self.generator_list)
  File "C:\DeepFaceLabCUDA9.2SSE\_internal\DeepFaceLab\models\Model_SAE\Model.py", line 375, in onTrainOneIter
    src_loss, dst_loss, = self.src_dst_train (feed)
  File "C:\DeepFaceLabCUDA9.2SSE\_internal\python-3.6.8\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "C:\DeepFaceLabCUDA9.2SSE\_internal\python-3.6.8\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "C:\DeepFaceLabCUDA9.2SSE\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
    run_metadata_ptr)
  File "C:\DeepFaceLabCUDA9.2SSE\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[504] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node mul_137}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Adam/beta_2/read, Variable_101/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[{{node Mean_2/_1091}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6240_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Done.
Press any key to continue . . .
#2
4 gb of vram is never going to run a 128 res default sae model at BS 100. No wonder it's OOM-ing

Try BS 8.
#3
Thanks. I've tried batch size of 4 even and get nowhere with SAE. After more reading and understanding I tried H64 with a batch of 8 and it at least started training but still didn't last long. I figured 4GB would be able to at least half what I see as settings for 8GB but I guess not :/
#4
You should be able to use a lightweight option, or lowmem option, (I forget what it's called with DFL, I think light encoder). I know that there have been successful DeepFakes with DFL on 2 Gb cards, but maybe not using the SAE model, I don't know. Check the Guides section for more info about DFL. The batch size will probably need to be low, though.
#5
I have a GTX 980 and top out at about BS 10 for SAE. I need to use optimizer mode 2, though. I also need to restart my PC sometimes for it to work - I think the cache builds up too much maybe. The only other thing I have different is multiscale decoder - true.
#6
(04-16-2019, 07:00 PM)tarqua Wrote: You are not allowed to view links. Register or Login to view.Thanks. I've tried batch size of 4 even and get nowhere with SAE. After more reading and understanding I tried H64 with a batch of 8 and it at least started training but still didn't last long. I figured 4GB would be able to at least half what I see as settings for 8GB but I guess not :/

use BS 4, optimizer 2

Forum Jump:

Users browsing this thread: 1 Guest(s)