Show that grad f(x) = 0 when x is a local minimum of f

In summary, a local extremum of a function ##f: \mathbb{R}^2 \to \mathbb{R}## is a point where the partial derivatives of ##f## are equal to 0. This is because the partial derivatives describe the slope of the function at a point, and if the slope is 0, then there is a local extremum at that point. However, this is not a complete proof, as it is necessary to consider the eigenvalues of the second derivative of ##f## at the point to determine if it is a local minimum or a saddle point. Additionally, the definition of a local extremum under additional constraints is only applicable when there are no constraints.
  • #1
JulienB
408
12

Homework Statement



Hi everybody! I'm preparing a maths exam and I unsure how to answer this question:

What is a local extremum of a function ##f: \mathbb{R}^2 \to \mathbb{R}##? Explain why ##\nabla f(x) = 0## when ##x## is a local minimum of ##f## and ##f## is continuous and differentiable.

Homework Equations



In the chapter about local extrema, the script of my teacher says:

"##f: M \to \mathbb{R}## (##M \subseteq \mathbb{R}^n## is an open set) is the function for which the local extrema will be searched, ##g: M \to \mathbb{R}^m## describes the additional conditions, and ##m \in {0, 1, ... , n}## is the number of additional conditions.
A point ##x_0 \in M## with ##g(x_0) = 0## is called local minimum of ##f## under the additional condition ##g(x) = 0## if ##\exists \epsilon > 0##, so that ##\forall x \in M## with ##0 < || x - x_0 || < \epsilon## the statement ##f(x) \geq f(x_0)## is valid."

The Attempt at a Solution



I don't really get the point of the question about ##\nabla f(x) = 0##. I would say that since ##f'(x) = \nabla f(x)##, ##\nabla f(x)## describes the slope of the function at point ##x##. If the slope is ##0##, then there is a local extrema at point ##x##. So if ##f## has a local minimum (or local maximum) at point ##x##, ##\nabla f(x) = 0##. Would you say that is enough? It is always nicer to answer such questions with a mathematical description rather than with words... Do you guys have a suggestion?

The script of my teacher confuses me. I don't understand what are those additional conditions. Is ##g## equivalent to ##f'##? That would be weird, but that would be explain why ##g(x) = 0## indicates a local extremum for ##f##. If not, can someone briefly explain me what that is?

There is also something about matrices (apparently the Hesse-matrix especially). I tried to manipulate it but without knowing the function it didn't take me anywhere. Is it relevant for this question?

Thanks a lot in advance for your answers, and sorry I posted so many questions. I would really appreciate your help. :)Julien.
 
Last edited:
Physics news on Phys.org
  • #2
JulienB said:
If the slope is 0, then there is a local extrema at point ##x##.
If there is a local extremum at ##x##, then ##\nabla f(x) = 0##. (So by solving ##\nabla f(x) = 0## you find the candidates for being a local extremum.) The converse is not true: If ##\nabla f(x) = 0## then ##x## could also be a saddle point (= an "inflection point" in case ##n = 1##.) One way to be sure is to look at the eigenvalues of the second derivative (the Hessian) of ##f## at ##x##, if it exists.

JulienB said:
So if ##f## has a local minimum (or local maximum) at point ##x##, ##\nabla f(x) = 0.##

Exactly.

JulienB said:
Would you say that is enough? It is always nicer to answer such questions with a mathematical description rather than with words... Do you guys have a suggestion?

It depends. If this is indeed a mathematics exam, I would expect him to want a rigorous proof. So, assume that ##x \in M## is a local extremum and prove that this implies that ##\nabla f(x) = 0##. As it stands, you have not proven that yet.

JulienB said:
The script of my teacher confuses me. I don't understand what are those additional conditions. Is ##g## equivalent to ##′f'##? That would be weird, but that would be explain why ##g(x) = 0## indicates a local extremum for ##f##. If not, can someone briefly explain me what that is?

Your teacher is likely talking about constrained optimization (with equality constraints). What you have been talking about so far is unconstrained optimization. In the latter case there are no side conditions, which corresponds to the case ##m = 0## in your teacher's definition.

If you want to learn how to do rigorous proofs, I would advise you to first try to tackle the unconstrained case.
 
  • Like
Likes JulienB
  • #3
Hi @Krylov and first thank you for your answer. The part about constrained optimization has helped me a bit, especially to understand that this case is unconstrained. Here is what I wrote after reading your post:

##f:\mathbb{R}^2 \to \mathbb{R}##, ##x \in \mathbb{R}^2##
##x## local minimum of ##f##, ##f## continuous and differentiable ##\implies## ##f'(x) = ( \partial_1 f(x)\ \partial_2 f(x) ) = (0\ 0)##
##\nabla f(x) = ( \partial_1 f(x), \partial_2 f(x) ) = (0, 0) = \vec{0}##

Is that a more mathematically rigorous proof then?Thanks a lot.Julien.
 
  • #4
And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.Julien.
 
  • #5
JulienB said:
Hi @Krylov and first thank you for your answer. The part about constrained optimization has helped me a bit, especially to understand that this case is unconstrained.
:smile:

Let me also point out that I find your teacher's definition

JulienB said:
A point ##x_0 \in M## with ##g(x_0) = 0## is called local minimum of ##f## under the additional condition ##g(x) = 0## if ##\exists \epsilon > 0##, so that ##\forall x \in M## with ##0 < || x - x_0 || < \epsilon## the statement ##f(x) \geq f(x_0)## is valid."

only makes sense when ##m = 0## (i.e. there are no constraints). Otherwise, it does not make sense, because in the defining part of his definition there is no mentioning of ##g## anywhere.

JulienB said:
Here is what I wrote after reading your post:

##f:\mathbb{R}^2 \to \mathbb{R}##, ##x \in \mathbb{R}^2##
##x## local minimum of ##f##, ##f## continuous and differentiable ##\implies## ##f'(x) = ( \partial_1 f(x)\ \partial_2 f(x) ) = (0\ 0)##
##\nabla f(x) = ( \partial_1 f(x), \partial_2 f(x) ) = (0, 0) = \vec{0}##

Is that a more mathematically rigorous proof then?

I'm afraid I would have to say "no". Surely I am not the "gold standard", but the way I would write it is something like this:

Let ##f : M \subseteq \mathbb{R}^2 \to \mathbb{R}## be differentiable on the open set ##M##. (You could assume ##M = \mathbb{R}^2## for simplicity. This does not change much.) If ##f## has a local minimum at ##x_0 \in M##, then ##\nabla f(x_0) = 0##.

Proof: By definition, there exists ##\epsilon > 0## such that for all ##x \in M## with ##\|x - x_0\| < \epsilon## the statement ##f(x) \ge f(x_0)## holds. (This is just your teacher's definition.) We need to prove that ##\nabla f(x_0) = 0##. I will give you the start of the proof (which works equally well in spaces of higher dimension), so you can try to proceed.

Suppose that ##\nabla f(x_0) \neq 0##. Then there exists some ##h \in \mathbb{R}^2## such that ##\nabla f(x_0) \cdot h \neq 0##. By replacing ##h## by ##-h##, if needed, we may assume that ##\nabla f(x_0) \cdot h < 0##. Now, by the definition of the differentiability of ##f## at ##x_0##,
$$
f(x) = f(x_0) + \nabla f(x_0) \cdot(x - x_0) + R(\|x - x_0\|)
$$
for all ##x## sufficiently close to ##x_0##. (Here the remainder term ##R(\|x - x_0\|) = o(\|x - x_0\|)## as ##x \to x_0##.) So, in particular, for ##x = x_0 + t h## (with ##t > 0## small enough such that ##\|x - x_0\| < \epsilon##), we have
$$
0 \le f(x_0 + th) - f(x_0) = t \left[\nabla f(x_0) \cdot h + \frac{1}{t}R(th)\right]
$$
Now, how do you obtain a contradiction?

It really depends on what you teacher is after. If he wants an entirely rigorous mathematical proof, I would give him something like the above. If he is fine with something more hand waving, then you could get away with less.
 
Last edited:
  • #6
JulienB said:
And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.

Julien.

I would define "local extremum" following the definition of your teacher ("There exists ##\epsilon > 0## such that ##f(x) \ge f(x_0)## (min.) or ##f(x) \le f(x_0)## (max.) for all ##x \in M## with ##\|x - x_0\| < \epsilon##.")

If ##f## happens to be differentiable at ##x_0## there exists a necessary condition for an extremum (indeed, ##\nabla f(x_0) = 0##). If in addition ##\nabla f## is also differentiable close to ##x_0## and its derivative is continuous, then you can use the definiteness of the Hessian as a sufficient condition. When ##n = 2## this definiteness can be verified by inspecting the sign of the determinant of the Hessian, as you do when you write ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##. The inequality needs to be strict, though.
 
  • #7
JulienB said:
And for the first question, I now wrote:

A point ##\vec{x} \in \mathbb{R}^2## is called local extremum of ##f: \mathbb{R}^2 \to \mathbb{R}##, if:
- ##f## is continuous and differentiable;
- ##\partial_1 f(\vec{x}) = \partial_2 f(\vec{x}) = 0##;
- ##\partial_1^2 f(\vec{x}) \partial_2^2 f(\vec{x}) \geq (\partial_1 \partial_2 f(\vec{x}))^2##.

Would you say those conditions are necessary for ##\vec{x}## to be a local extremum of ##f##? I am still a little unsure about the ##\geq## in the 3rd condition. I think that with ##>##, ##\vec{x}## is then a strict local extremum, but the question did not mention "strict" so I replaced it with ##\geq##. Good/Bad idea? :)

And in the first condition, should ##f## be continuous and differentiable in ##\vec{x}## or everywhere?

Thanks a lot in advance for your answers.Julien.

There is a bit of a mis-match between 2nd-order necessary and sufficient conditions for a minimum. Assuming the function is ##C^2## in a neighbourhood of ##p_0 =(x_0,y_0)##, we first of all need ##f_x(p_0) = f_y(p_0) = 0## (where ##f_x, f_y## are the partial derivatives). That is a first-order necessary condition.

For second-order we have:

(1) Necessary (not sufficient) conditions for a minimum at ##p_0## are ##f_{xx}, f_{yy} \geq 0## and (as you say) ##f_{xx} f_{yy} \geq f_{xy}^2## (all evaluated at ##p_0##). Basically, the Hessian matrix must be positive semi-definite. (Note that besides the second "mixed" condition you gave, we also need the individual conditions on ##f_{xx}## and ##f_{yy}##, because without them we could have ##f_{xx} < 0, f_{yy} < 0## but ##f_{xx} f_{yy} \geq f_{xy}^2##, and that would correspond to a saddle point, not a minimum.)

To see that these are not sufficient, look at the function ##f(x,y) = (y-x^2)(y - 2x^2)##. Above the upper parabola ##U: y = 2 x^2## both factors of ##f## are > 0, so ##f > 0##. Below the lower parabola ##L: y = x^2## both factors are < 0, so ##f > 0## again. However, between the two parabolas (that is, where ##x^2 < y < 2 x^2##) the two factors have opposite signs, so ##f < 0##. Thus, the origin ##O = (0,0)## is neither a local max nor a local min. However, we have ##f_x =0, f_y=0## at ##O##, and the Hessian matrix is
$$H(x,y) \equiv \pmatrix{f_{xx} & f_{xy} \\f_{xy} & f_{yy}} = \pmatrix{24 x^2 - 6y & -6x \\ -6x & 2}, $$
so
$$H(O) = \pmatrix{0 & 0 \\ 0 & 2 } $$.
All the second-order necessary conditions hold at ##O##.(2) Sufficient (not necessary) conditions for a strict local minimum at ##p_0## are that ##f_{xx} > 0, f_{yy} > 0## and ##f_{xx} f_{yy} > f_{xy}^2##. This states that the Hessian matrix is positive definite.

To see that these are not necessary, look at ##f(x) = x^4 + y^4##. The origin ##O = (0,0)## is a strict global minimum of ##f##, but the sufficient conditions fail to hold at ##O##.

By the way: the function ##f## in (1) above was something I heard about in a seminar by Adi Ben-Israel many years ago. It has the intriguing property that if you go a short distance away from ##(0,0)## along ANY line, the function goes up strictly (that is, leads to points where ##f(p) > f(0,0)## for all nonzero ##p## a short distance along the line). Nevertheless, ##(0,0)## is NOT a local minimum.
 
Last edited:
  • Like
Likes JulienB
  • #8
Mentor note: Helpers are reminded not to provide complete solutions or to do significant portions of the work towards such a solution. Instead, the questioner should be encouraged towards the solution via hints and suggestions or the pointing out of errors or misconceptions.
 
  • #9
Thank you everybody for your answers, this thread has helped me a lot! I'm back to my books :)Julien.
 
  • Like
Likes S.G. Janssens

FAQ: Show that grad f(x) = 0 when x is a local minimum of f

1. What does it mean for x to be a local minimum of f?

When x is a local minimum of f, it means that the value of f(x) is less than or equal to the values of f at all other nearby points.

2. How is the gradient of a function related to its local minimum?

The gradient of a function is a vector that points in the direction of steepest increase. At a local minimum, the gradient is equal to zero because there is no steepest increase in any direction.

3. How can we prove that grad f(x) = 0 when x is a local minimum of f?

We can use the first derivative test to show that the gradient at x is equal to zero. The first derivative test states that if the first derivative of a function is zero at a point, then that point is a local minimum.

4. Can a function have multiple local minima?

Yes, a function can have multiple local minima. This means that there are multiple points where the function reaches its lowest values, and the gradient at each of these points is equal to zero.

5. What is the significance of proving that grad f(x) = 0 at a local minimum?

Proving that grad f(x) = 0 at a local minimum is important because it allows us to identify and locate these critical points, which can provide valuable information about the behavior of the function in that region. It also helps us to optimize the function by finding the global minimum, which may be located at one of the local minima.

Back
Top