Mr DeepFakes Forums
  • New and improved dark forum theme!
  • Guests can now comment on videos on the tube.
ConfituraMan33,000 iterations, loss 0.0n.. errors

I am trying to create a deep fake of an actor's face on  an actress' head. I am using DeepFaceLab on linux.

I am currently at 33k itrations, and the loss shows 0.5/4n.. Pixel Loss has been activated at around 30k when the loss was at around 1.5 and 0.4, because I did not see any improvements. The loss dramatically dropped to under 0.1 - 0.05.

I wonder if I have to do anything else to fix the obvious "errors". Basically it's the mouth and the right eye and you can see the face overlay a bit.

I may add that the source material contains a moving face / yaw. Open and closed mouth (src), a visible tongue (src) and I am training with 185-190 images for src and dst.

For the merge I used seamless, histo, dst mask (learned available), rct , super resolution and the rest are defaults.

Is there any solution or do I have to wait until 60k-100k often mentioned?


[Image: z9zf3l02d5431.png]

My training settings:

== Model name: SAE
== Current iteration: 34899
== Model options:
== |== batch_size : 6    // 4 at start , 6 from 34899it
== |== sort_by_yaw : True
== |== random_flip : False
== |== resolution : 64
== |== face_type : h
== |== learn_mask : True
== |== optimizer_mode : 2  // 1 at start, 2 from 34899it
== |== archi : df
== |== ae_dims : 32
== |== e_ch_dims : 21
== |== d_ch_dims : 10
== |== multiscale_decoder : True
== |== ca_weights : True
== |== pixel_loss : True
== |== face_style_power : 0.5  // 0.1 incr from 6000,
== |== bg_style_power : 0.5 // 0.1 incr from 6000,
== |== apply_random_ct : True
== |== target_iter : 38000
== Running on:
== |== [CPU]
Starting. Target iteration: 38000. Press "Enter" to stop training and save model.
I think I may have an idea but its difficult to articulate. mind sharing your facesets to inspect?
Sure. Here's the link (it's a tar.gz file, I am on linux): You are not allowed to view links. Register or Login to view.

I zipped the workspace (205M). The model included is a bit more recent,. Saw some impovements in the preview especially the nose and the eye, also a bit around the mouth, but not that much in the final render. Though, the last attempt even shows camera flashes on the face. That's cool.

These are how I started the scripts:

// train
python train --training-data-src-dir ./workspace/data_src/aligned --training-data-dst-dir ./workspace/data_dst/aligned --model-dir ./workspace/model/ --model SAE

// convert
python convert --input-dir workspace/data_dst/ --output-dir workspace/data_dst/merged/ --aligned-dir workspace/data_dst/aligned --model-dir workspace/model/ --model SAE

// rename merged img seq (for ffmpeg)
ls *.png | awk 'BEGIN{ a=0 }{ printf "mv %s portia-%05d.png\n", $0, a++ }' | bash

// render mp4
ffmpeg -i portia-%05d.png -r 25 -c:v libx264 -crf 17 portia_rami_32894it.mp4

I ran it on a 12 core Intel XEON (no gpu) cloud server, but DeepFaceLab only utilized 5 of them. Booked the machine for 32 hrs, but trained a  few hours less.
You've only got 187 sources faces. I doubt that's going to be enough.

I realize the destination video's only 190 frames long, but I think you're still going to need more source faces.

Source video frames are also only 360 pixels high, with the face being a fraction of that. Even with SAE at only 64 res, that's very low res face material.

Also, the first 80 or so frames of the DST video has been misdetected. The left side of the mask is lined up more with the ear than the cheek.

I'd find additional video material for your source. Higher resolution too if at all possible. A higher resolution destination video would help too.

Additionally, I just noticed the dims:

== |== ae_dims : 32
== |== e_ch_dims : 21
== |== d_ch_dims : 10

These are super super small. I'm kind of surprised the neural network did as well as it did at these sizes.
thanks. I first tried to train on my laptop. It collapsed at 3,000 or so. I tried again using the lowest settings possible, it worked, but took so long. Then I moved the model to a cloud server with 12 cores Intel XEON. Maybe I should have started over?
Do you know how much faster a Tesla V100 is compared to 4-12 CPU? Somewhere I read a Tesla V100 offers performance of 100 CPU? So instead of 33 hrs, I'd do the same in 19.8 minutes (size 64)?
I tried running your limited facesets with sae-df at 128 res, default dims.

Here's the model options:

== |== batch_size : 16
== |== sort_by_yaw : False
== |== random_flip : True
== |== resolution : 128
== |== face_type : f
== |== learn_mask : False
== |== optimizer_mode : 1
== |== archi : df
== |== ae_dims : 512
== |== e_ch_dims : 42
== |== d_ch_dims : 21
== |== multiscale_decoder : False
== |== ca_weights : True
== |== pixel_loss : False
== |== face_style_power : 0.0
== |== bg_style_power : 0.0
== |== apply_random_ct : True

I ran it for about 2 hours, to iteration 7,000 on batch size 16 on my 1080ti.

Here's the last preview:

[Image: nrvFYsWh.jpg]

Considering it's only 2h in, the previews don't look too bad.  I used three test frames to convert.  0129, 0170 and 0208.

The first and third one don't look terrible.  The middle one is one I tried to match to your example.  It's not converting well at all.  I still think more faces, and higher resolution faces would help.

Conversion settings - Overlay Fan-Dst, erode 0, blur 100, rct.  I tired lct at first, but it was too aggressive with the colors, blowing out the highlights.

[Image: uOX9LiYh.png]

[Image: h4S7gdXh.png]

[Image: Nytt4iuh.png]

Any GPU is going to be better than CPU training.  If you're talking about shadow PC, then yea, that's a pretty good deal if you don't have the hardware locally.

Thank you very much. That's a much better result. I think I know what it is, besides the limited number of src images. in the src images there's the actor playing with his tongue. it looks to me as it's having problems with these frames as there's no equivalent in the dst frameset.
The fact that you have the exact same problem at the same frame brought me to think that.

Your settings will wouldn't work on the cloud server I used. It would take too long. I will try to do it again with hd video material using a Tesla V100 in the next days. Would cost me around $2.50 an hour, but worth the attempt.

thank you for your help.
If you are being charged by the hour I would not try it. it is going to take much longer than you realize. I rented a P5000 at 78 cents an hour and lets just say it cost me way more than I'd like. CPU and GPU computing doesn't scale like you think. Its not going to take 19 minutes to render you a model.
(06-14-2019, 08:46 AM)ConfituraMan Wrote: You are not allowed to view links. Register or Login to view.@titan_rw

 I will try to do it again with hd video material using a Tesla V100 in the next days. Would cost me around $2.50 an hour, but worth the attempt.

thank you for your help.

At $2.50 per hour, I don't think it'll be cost effective.  Expect at least 12h training time, possibly more.

Think of GPU's as allowing a more complex model creating better results, but still in a reasonable time.  Let's theoretically say a gpu is 100x faster than a certain cpu.  Your model that trains on a cpu isn't going to train 100x faster.  But it will allow for 100x more complex model that's not practical to run on a cpu.  Just an example, pulling numbers out of thin air.  

I know people here have used ShadowPC for training.  From what I understand it's $30 a month, and you get use of a decent dedicated gpu.

Forum Jump:

Users browsing this thread: 1 Guest(s)