MrDeepFakes Forums

Some content may not be available to Guests. Consider registering an account to enjoy unrestricted access to guides, support and tools

  • We are looking for community members who are intested in helping out. See our HELP WANTED post.

What is this new feature called gradient clipping ?

fakerdaker

DF Vagrant
I just recently started reading about machine learning so maybe I can explain it a bit.

Basically what a neural network does is take some inputs (src images in this case) and creates a function that will produce a desired output (dst img). It makes this conversion function more and more precise the longer its trained. But it also needs some way to know how close the output it generates is to the desired output. Essentially it compares the output it produced to the desired output via what's called a cost function. 

The cost function optimizes this output by minimizing the loss created by this function. It uses something called a gradient, which you would recognize if you've taken multi-variable calculus. If not, the gradient tells you in what direction the functions rate of change is the steepest for a given input. The simple explanation is, it tells how to change the inputs to achieve relative minimum loss.
68747470733a2f2f707669676965722e6769746875622e696f2f6d656469612f696d672f70617274312f6772616469656e745f64657363656e742e676966.gif
fastlr.png

In these images imagine x and y are some information about the image that will be modified (there will be way more than just two), and z is the loss created by changing these pieces of information.

The image on the left is the ideal result of our cost function, we adjust our inputs (x,y) until the resulting z(loss) is at the lowest point (relative minimum). The gradient tells us which way our inputs need to be changed.
The image on the right shows what happens when we follow the gradient too far at each point. If we make our steps too big we will keep overshooting our target. But if our steps are very small, it will take a long time to converge.

So I think what gradient clipping does, is it limits the distance you follow that path for each step, not allowing your steps to be too big. I'm guessing the model corruption is a result of "gradient explosion" which is why iperov chose to limit the gradient by "clipping" it. https://www.quora.com/What-is-gradient-clipping-and-why-is-it-necessary
I'm really new to this topic too, but it sounds like your 'steps' can get way too large and your loss gets stuck at a maximum, which causes corruption?

This site explains it pretty well, but might be difficult to understand if you have no background in calculus.
https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/

I should note, I don't know how this process works for the combined image, but I believe this is what happens when it is generating its version of the source and dst images. If anyone has a better understanding please feel free to correct this as this is my best understanding from self learning.
 

LCC

DF Pleb
Adding to Fakerdaker's great explanation. 

If anyone is curious to understand DeepFaking a bit better, the graphs in this Youtube video really helped me. When you're training and getting closer to 0, you're moving closer to the centre of the floor in the 3D graphs.
 

VirginBoI

DF Pleb
fakerdaker said:
I just recently started reading about machine learning so maybe I can explain it a bit.

Basically what a neural network does is take some inputs (src images in this case) and creates a function that will produce a desired output (dst img). It makes this conversion function more and more precise the longer its trained. But it also needs some way to know how close the output it generates is to the desired output. Essentially it compares the output it produced to the desired output via what's called a cost function. 

The cost function optimizes this output by minimizing the loss created by this function. It uses something called a gradient, which you would recognize if you've taken multi-variable calculus. If not, the gradient tells you in what direction the functions rate of change is the steepest for a given input. The simple explanation is, it tells how to change the inputs to achieve relative minimum loss.
68747470733a2f2f707669676965722e6769746875622e696f2f6d656469612f696d672f70617274312f6772616469656e745f64657363656e742e676966.gif
fastlr.png

In these images imagine x and y are some information about the image that will be modified (there will be way more than just two), and z is the loss created by changing these pieces of information.

The image on the left is the ideal result of our cost function, we adjust our inputs (x,y) until the resulting z(loss) is at the lowest point (relative minimum). The gradient tells us which way our inputs need to be changed.
The image on the right shows what happens when we follow the gradient too far at each point. If we make our steps too big we will keep overshooting our target. But if our steps are very small, it will take a long time to converge.

So I think what gradient clipping does, is it limits the distance you follow that path for each step, not allowing your steps to be too big. I'm guessing the model corruption is a result of "gradient explosion" which is why iperov chose to limit the gradient by "clipping" it. https://www.quora.com/What-is-gradient-clipping-and-why-is-it-necessary
I'm really new to this topic too, but it sounds like your 'steps' can get way too large and your loss gets stuck at a maximum, which causes corruption?

This site explains it pretty well, but might be difficult to understand if you have no background in calculus.
https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/

I should note, I don't know how this process works for the combined image, but I believe this is what happens when it is generating its version of the source and dst images. If anyone has a better understanding please feel free to correct this as this is my best understanding from self learning.

OK basically it reduces the chances of collapse but  it has to analyse for right steps  thus making the training slower
 

fakerdaker

DF Vagrant
LCC said:
Adding to Fakerdaker's great explanation. 

If anyone is curious to understand DeepFaking a bit better, the graphs in this Youtube video really helped me. When you're training and getting closer to 0, you're moving closer to the centre of the floor in the 3D graphs.

That channel is fantastic. That series gives a good overview of how neural networks work. He even has a video that goes kind of in depth into back propagation, pretty interesting stuff.


VirginBoI said:
OK basically it reduces the chances of collapse but  it has to analyse for right steps  thus making the training slower

My understanding is that if the model attempts a step that is deemed too large, then it automatically reduces that step to a smaller number. Which theoretically reduces the chance of collapse.

"When the traditional gradient descent algorithm proposes to make a very large step, the gradient clipping heuristic intervenes to reduce the step size to be small enough that it is less likely to go outside the region where the gradient indicates the direction of approximately steepest descent."
https://machinelearningmastery.com/...ts-in-neural-networks-with-gradient-clipping/
 

LCC

DF Pleb
fakerdaker said:
LCC said:
Adding to Fakerdaker's great explanation. 

If anyone is curious to understand DeepFaking a bit better, the graphs in this Youtube video really helped me. When you're training and getting closer to 0, you're moving closer to the centre of the floor in the 3D graphs.

That channel is fantastic. That series gives a good overview of how neural networks work. He even has a video that goes kind of in depth into back propagation, pretty interesting stuff.

Ha, your graph had me back propagating the hell outta my brain to figure out which video it was. (I didn't just Google that to understand what the hell it was... okay I lied, I did)

Watched a bunch of his stuff a while back instead of studying. The Youtube Algorithm doing gods work, saving me from maths videos. 

Might have to rewatch properly now that it's actually useful, he's got a great way of explaining things.
 

dpfks

DF Enthusiast
Staff member
Administrator
Verified Video Creator
Moved to questions section

I haven't had issues with collapse recently, but thanks for the great explanation everyone.
 

udraw

DF Vagrant
been training for 5 hours with pixel loss and gradient clipping on. no collapses yet. I'm not concerned even if there is a collapse, hourly backup is a good feature. Also I'm retraining an SAE model 116k epochs onwards.

So far gradient clipping has resurrected pixel loss feature back to life. (Let's hope it stays the same)

Also I didn't find the option for gradient clipping on H128 model I planned to retrain.
 
Top