Deeper understanding of the gradient and directional derivative

In summary, the formula for the gradient, which is the sum of the partial derivatives with respect to x and y, gives the direction of greatest increase. This can be seen by considering the directional derivative, which measures the rate of change in an arbitrary direction and can be expressed as a dot product between the gradient vector and a unit vector. The direction of greatest increase is parallel to the gradient vector, making it the direction of steepest ascent. The gradient can also be understood as a way to approximate the change in a function by using the partial derivatives and the chain rule. This approximation becomes more accurate as the changes in x and y become smaller.
  • #1
autodidude
333
0
Why does the formula for the gradient - that is (for functions of 2 variables), the partial with respect to x plus the partial with respect to y give the direction of greatest increase?

i.e. the direction of maximum at some point on a surface is given by [tex]f_xi+f_yj[/tex]

And why, when you times each partial derivative with the corresponding components of a vector, it gives the derivative of the surface in the direction of that vector?

i.e. the derivative of f(x,y) in the direction of <a, b> is [tex]af_x+bf_y[/tex]

The proofs don't offer an understanding of why the formula does it what does - at least for me

My lack of understanding of this may have something to do with the fact that I still don't get intuitively why a change in the x direction plus change in y direction gives the total change.
 
Mathematics news on Phys.org
  • #2
autodidude said:
My lack of understanding of this may have something to do with the fact that I still don't get intuitively why a change in the x direction plus change in y direction gives the total change.

Partial derivatives make that approximation by approximating the shape of a function of two variables locally as a plane. Hold one corner of a book against a table. Is it clear that your finger rises the same amount above the table by running it over the book from the corner on the table to the diagonally opposite corner as it would if you ran it along two sides of the book to reach the opposite corner?
 
  • #3
One way to simplify the understanding, is to work all this out for the geometry of a linear function, z=cx+dy. If you move on unit along x, then one unit along y, you've traced the edge of a parallelogram situated above a unit square, and you've risen c+d units.

There, that's why the total is the sum. Of course, this is just the linear approximation in general, for small dx and dy.

You can replace the unit square in the x-y plane with a rectangle of side lengths a and b. Then just imagine the parallelogram again.

But I forgot to try to help with your other question, why does it give the direction of greatest increase. Hmm... I guess if you considered all the unit vectors, the one giving the steepest increase would be parallel to the gradient. Or, you might try to play with visualizing the rise and run of triangle for variously oriented planes. Not sure if there's a better way, anyone got some tricks for why gradient is direction of steepest ascent?
 
  • #4
To find out why it is we start with the directional derivative.

Directional Derivative
I expect you understand the partial derivatives and have a somewhat intuitive understanding of them. Otherwise we would have to start there.

The idea is to find out the rate of change in an arbitrary direction. So let's say we look at the point P(x0, y0) of the function f(x,y).

We want to know the rate of change in the direction of the vector u = <A, B>
How do we do this? Let's define u to be a unit vector.

Well the change in the function is of course Δf = f(x0+Ah,y0+Bh)-f(x0,y0)

This is the fundamental idea that we need to grasp. The vector controls the increase in x and y by its A and B components. The h variable is a continuos variable. In fact, the function
L(h) = f(x0+Ah,y0+Bh) describes all the values of the function f(x,y), that lies on the line parallel to the vector u.

If we accept this, then the rate of change can be defined by how much f(x,y) is changing, with respect to changes to this variable h.

So we have Δf/h = (f(x0+Ah,y0+Bh)-f(x0,y0))/h

When we take the limit, we of course end up with df/dh, which is the derivative. But let's find out what that fella actually looks like. Well to do this we have to use the dreaded Chain Rule!

Chain Rule
You said you had a problem understanding that a change due to change in y plus a change due to change in x is equal to the total change. Well let's look at that now.

If we want to approximate the change that happens in a function f, how can we do it?

Well if we know the partial derivatives, we know the rate of change in each direction (x and y). So we can compute the approximate change by the formula

Δf ≈ ∂f/∂x Δx + ∂f/∂y Δy

Why is this? Well if we say that
Δf1 = f(x0+Δx, y0) - f(x0, y0) and
Δf2 = f(x0, y0+Δy) - f(x0, y0)

Then Δf ≈ f(x0+Δx, y0) + f(x0, y0+Δy) - 2 f(x0, y0)

This can only be true if the two first values are approximately equal to each other. That is
f(x0+Δx, y0) ≈ f(x0, y0+Δy)

Is this true? Well yes of course it is! If the change in x or change in y is very, very small. Then we will have almost no change in the function, and the two values will almost (but not quite) be equal to each other.

So since we can describe Δf ≈ ∂f/∂x Δx + ∂f/∂y Δy
If we then divide it all by h (since in our case Δx and Δy are actually Ah and Bh respectivly), we have.

Δf/Δh ≈ ∂f/∂x A + ∂f/∂y B

When we take the limit we end up with

df/dh = ∂f/∂x A + ∂f/∂y B

Gradient
We now have the directional derivative. The next question would be - In which direction do we find the greatest rate of change?

Well we can choose to view the above equation as the dot product between two vectors.

so df/dh = <∂f/∂x , ∂f/∂y> * <A, B> = v * u

The dot product is also determined by

<∂f/∂x , ∂f/∂y> * <A, B> = |v||u|cos(θ)

Since |u| = 1 (because it is a unit vector) we have

df/dh = |v|cos(θ)

When is this equation the largest? Well it is the largest when the angle θ is zero. When is it zero? It is zero when the two vectors are parallel. This the greatest rate of change is in the direction of the vector v. This vector we call the gradient and signify by ∇f.



I hope this helped :)
 
  • #6
Another way to look at it: the "directional derivative", the rate of change of f(x,y,z), in the direction of unit vector [itex]\vec{v}[/itex], is given by [itex]\nabla f\cdot\vec{v}[/itex]. One way to show that is to note that any unit vector can be written in terms of "direction cosines". That is [itex]\vec{v}= < cos(\theta_x), cos(\theta_y), cos(\theta_z)>[/itex] where [itex]\theta_x[/itex] is the angle [itex]\vec{v}[/itex] makes with the x-axis, [itex]\theta_y[/itex] is the angle [itex]\vec{v}[/itex] makes with the y-axis, and [itex]\theta_z[/itex] is the angle [itex]\vec{v}[/itex] makes with the z-axis. Take derivatives with respecty to the angles and set them equal to 0 to see that maximizing that function requires that the angle are also the direction angles for the unit vector. Also, since a dot product with a unit vector is the length of the projection of the vector on the unit vector, it is easy to see that the dot product is largest when the unit vector is parallel to the given vector. And that implies the derivative is largest in the direction of the gradient vector.

Also note that the dot product of two non-zero vectors is 0 if and only if the vectors are perpendicular. It follows that the gradient vector is always perpendicular to a constant value surface.
 
  • #7
I think that this is a less formal version of what Halls of Ivy was (correctly) saying. If f(x,y) is a scalar function of x and y in the x-y plane, then the change in f between the point x,y and the point x + dx, y + dy is given by:

df = fxdx + fydy

The right hand side of this equation can be expressed as the dot product of two vectors:

[tex]df = (f_x \mathbf{i}+f_y\mathbf{j})\cdot(dx\mathbf{i}+dy \mathbf{j})[/tex]

The vector [tex](f_x \mathbf{i}+f_y\mathbf{j})[/tex] is called the "gradient of f", and the vector [tex](dx\mathbf{i}+dy \mathbf{j})[/tex] is a differential position vector drawn between the point x,y and the point x + dx, y + dy. The differential position vector can also be expressed as:

[tex]\mathbf{ds}=(dx\mathbf{i}+dy \mathbf{j})= ds (\cos{\theta}\mathbf{i} +\sin{\theta}\mathbf{j}) [/tex]

where

[tex]ds=\sqrt{(dx)^2+(dy)^2}[/tex]

[tex]\theta=\arctan{(\frac{dy}{dx})}[/tex]

Physically, θ is the angle between the differential position vector [tex](dx\mathbf{i}+dy \mathbf{j})[/tex] and the x axis.

In terms of ds and θ, the equation for df now becomes:

[tex]df = ds(f_x \mathbf{i}+f_y\mathbf{j})\cdot(\cos{\theta}\mathbf{i}+\sin{\theta} \mathbf{j})=ds(f_x\cos{\theta}+f_y\sin{\theta})[/tex]
or equivalently
[tex]\frac{df}{ds}= f_x\cos{\theta}+f_y\sin{\theta}[/tex]

Now suppose we hold the length of the differential position vector ds constant, and ask the question "in what direction θ will the change in df be a maximum over the specified distance ds?" We can answer this question by taking the derivative with respect to θ, and setting the derivative equal to zero:

[tex]-f_x\sin{\theta}+f_y\cos{\theta}=0[/tex]

The solution to this equation is [tex]\theta=\arctan{(\frac{f_y}{f_x})}[/tex]
or equivalently:

[tex]\sin{\theta}=\frac{f_y}{\sqrt{(f_x)^2+(f_y)^2}}[/tex]

[tex]\cos{\theta}=\frac{f_x}{\sqrt{(f_x)^2+(f_y)^2}}[/tex]

If we substitute these relationships into the equation for df/ds to get the maximum value of df/ds over all possible orientations of the differential displacement vector, we obtain:

[tex]\frac{df}{ds}=\sqrt{(f_x)^2+(f_y)^2}[/tex]

This shows that the maximum df over all possible orientations of the differential position vector is equal to ds times the magnitude of the vector gradient of f, [tex](f_x \mathbf{i}+f_y\mathbf{j})[/tex].
 

FAQ: Deeper understanding of the gradient and directional derivative

1. What is a gradient?

A gradient is a mathematical concept used to represent the rate of change of a function. It is a vector that points in the direction of the steepest ascent of the function and its magnitude represents the slope of the function at a given point.

2. How is a gradient different from a directional derivative?

A gradient is a generalization of the directional derivative, which is the rate of change of a function in a specific direction. While a gradient represents the overall rate of change of a function, a directional derivative only considers the change in a specific direction.

3. How is the gradient calculated?

The gradient is calculated by taking the partial derivatives of a multivariable function with respect to each of its variables and combining them into a vector. This vector points in the direction of the steepest ascent of the function at a given point.

4. What is the significance of the gradient in real-life applications?

The gradient is an important tool in many fields, including physics, engineering, and economics. It is used to optimize functions and make predictions about the behavior of systems in the real world.

5. Can the gradient be negative?

Yes, the gradient can be negative. The sign of the gradient is determined by the direction of the steepest ascent of the function. If the function is decreasing in a certain direction, the gradient in that direction will be negative.

Similar threads

Back
Top