How to Avoid Exploding Gradients With Gradient Clipping
Last Updated on August 28, 2020
Training a neural network can become unstable given the choice of error function, learning rate, or even the scale of the target variable.
Large updates to weights during training can cause a numerical overflow or underflow often referred to as “exploding gradients.”
The problem of exploding gradients is more common with recurrent neural networks, such as LSTMs given the accumulation of gradients unrolled over hundreds of input time steps.
A common and relatively easy solution to the exploding gradients problem is to change the derivative of the error before propagating it backward through the network and using it to update the weights. Two approaches include rescaling the gradients given a chosen vector norm and clipping gradient values that exceed a preferred range. Together, these methods are referred to as “gradient clipping.”
In this tutorial, you will discover the exploding gradient problem and how to improve neural network training stability using gradient clipping.
After completing this tutorial, you will know:
- Training neural networks can become unstable, leading to a numerical overflow or underflow referred to as exploding gradients.
- The training process can be made stable by changing the error gradients either by
To finish reading, please visit source site