- #1
fog37
- 1,569
- 108
- TL;DR Summary
- Understand the delta rule increment in gradient descent
Hello Everyone,
I have a question about the gradient descent algorithm. Given a multivariable function ##f(x,y)##, we can find its minima (local or global) by either setting its gradient ##\nabla f = 0## or by using the gradient descent iterative approach. The first approach (setting the gradient equal to zero) is not always feasible for functions that have too many independent variables and are complicated.
Let's focus on the gradient descent and consider a 1D function ##f(x)## for simplicity. The gradient descent approach is a numerical method that involves the repetitive calculation of gradient ## - \nabla f ## to find the values of x where the function has a minimum.
If we are at a location ##x_0##, we calculate the gradient ##\Delta f(x_0)## and then move in the direction of the negative gradient by a step ##\Delta x##: $$x_{new} = x_{old} + \Delta x $$ $$ x_{new} = x_{old} - \eta \nabla f $$
Here my dilemma: why is the the increment ##\Delta x##, which can be positive or negative, equal to the the product of a small and arbitrary constant ##\eta## and the negative of magnitude of the gradient ##\nabla f##?
I don't see how the increment to apply to ##x_{old}## results to be equal to $$ \Delta x = \eta \nabla f $$
Is there an in depth derivation to prove that last result?
Thanks!
I have a question about the gradient descent algorithm. Given a multivariable function ##f(x,y)##, we can find its minima (local or global) by either setting its gradient ##\nabla f = 0## or by using the gradient descent iterative approach. The first approach (setting the gradient equal to zero) is not always feasible for functions that have too many independent variables and are complicated.
Let's focus on the gradient descent and consider a 1D function ##f(x)## for simplicity. The gradient descent approach is a numerical method that involves the repetitive calculation of gradient ## - \nabla f ## to find the values of x where the function has a minimum.
If we are at a location ##x_0##, we calculate the gradient ##\Delta f(x_0)## and then move in the direction of the negative gradient by a step ##\Delta x##: $$x_{new} = x_{old} + \Delta x $$ $$ x_{new} = x_{old} - \eta \nabla f $$
Here my dilemma: why is the the increment ##\Delta x##, which can be positive or negative, equal to the the product of a small and arbitrary constant ##\eta## and the negative of magnitude of the gradient ##\nabla f##?
I don't see how the increment to apply to ##x_{old}## results to be equal to $$ \Delta x = \eta \nabla f $$
Is there an in depth derivation to prove that last result?
Thanks!