Gradient Descent With RMSProp from Scratch

Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function.

A limitation of gradient descent is that it uses the same step size (learning rate) for each input variable. AdaGrad, for short, is an extension of the gradient descent optimization algorithm that allows the step size in each dimension used by the optimization algorithm to be automatically adapted based on the gradients seen for the variable (partial derivatives) over the course of the search.

A limitation of AdaGrad is that it can result in a very small step size for each parameter by the end of the search that can slow the progress of the

 

 

To finish reading, please visit source site