F convex iff Hessian matrix positive semidefinite

In summary, a function is convex if its Hessian matrix is positive semidefinite for all values of x, and a twice continuously differentiable function is convex if its second derivative is non-negative for all values of x. The proof involves using the chain rule and Taylor's theorem.
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey!

A function $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex if for all $x,y\in \mathbb{R}^n$ the inequality $$f(tx+(1-t)y)\leq tf(x)+(1-t)f(y)$$ holds for all $t\in [0,1]$.

Show that a twice continuously differentiable funtion $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is convex iff the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. I have read on some sites the idea of the proof but I am not sure that I understood that correctly. (Worried)

We suppose that the Hessian matrix is positive semidefinite for all $x\in \mathbb{R}^n$. We define the function:
$$g(t) = f(t \mathbf{x} + (1-t) \mathbf{y})$$
Then we can compute the derivatives of $g$:
$$g'(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla}f(t \mathbf{x} + (1-t) \mathbf{y})$$
$$g''(t) = ( \mathbf{x} - \mathbf{y})^T \mathbf{\nabla^2}f(t \mathbf{x} + (1-t) \mathbf{y}) ( \mathbf{x} - \mathbf{y})$$

For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
:unsure:

Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

Then we use this with Taylor's theorem to prove that:
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
Then if $t \in [0,1]$, these can then be combined to give:
$$
g(t) \le tg(1) + (1-t)g(0)
$$
which is equivalent to the inequality we wanted to prove. :unsure:
 
Physics news on Phys.org
  • #2
mathmari said:
For the derivatives we use the chain rule, right? Do we use this rule as follows? $$g'(t) = \frac{\partial}{\partial{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{\partial{f_i}}{\partial{t}}=\sum_i\frac{\partial}{\partial{f_1}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

Hey mathmari!

We cannot take partial derivatives with respect to $t$, because $t$ is not a parameter of $f$. :eek:

Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

mathmari said:
Since the Hessian is positive semidefinite, we have $g''(t) \ge 0$ for all $t$. I haven't really understood this one. How do we get that $g''(t) \ge 0$ ?

We have that $g''(t)=(\mathbf x-\mathbf y)^T H(\mathbf x-\mathbf y)$ with $H$ positive semidefinite.

By definition matrix $H$ is called positive semidefinite if for all $\mathbf z$ we have that $\mathbf z^T H \mathbf z \ge 0$. 🤔
 
  • #3
Klaas van Aarsen said:
Instead we need to take the total derivative $\frac d{dt}$:
$$g'(t) = \frac{d}{d{t}}f(t \mathbf{x} + (1-t) \mathbf{y})=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
🧐

Ah ok! This is clear now! (Malthe)

So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$

:unsure:
 
  • #4
mathmari said:
So the second derivative of $g$ is then calculated as follows?
$$g''(t) = \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d{f_i}}{d{t}}=\sum_i\frac{\partial^2}{\partial^2{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})$$
Nope. (Shake)

It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
 
  • #5
Klaas van Aarsen said:
It should be:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\end{align*}
🤔
Ah ok!

Then do we get \begin{align*}\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\end{align*}

🤔

But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ? 🤔
 
  • #6
mathmari said:
But how do we get the form $ (\mathbf{x}-\mathbf{y})^TH (\mathbf{x}-\mathbf{y})$ ?
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
 
  • #7
Klaas van Aarsen said:
Don't we have $\mathbf a \cdot \mathbf b = \mathbf b^T\mathbf a$? 🤔
Ahh yes!

So we have in this case:
\begin{align*}g''(t) &= \frac{d}{d{t}}\left (\sum_i\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\\
&=\left (\sum_i\frac{d}{d{t}}\left[\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right]\right )\cdot (\mathbf{x}-\mathbf{y})\\
&=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot \frac{d f_j}{d t}\right )\cdot (\mathbf{x}-\mathbf{y})
\\ &=\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\\ & =\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\cdot (\mathbf{x}-\mathbf{y})\\ & =(\mathbf{x}-\mathbf{y})^T\cdot\left (\sum_i\sum_j\frac{\partial}{\partial{f_j}}\frac{\partial}{\partial{f_i}}f(t \mathbf{x} + (1-t) \mathbf{y})\right )\cdot (\mathbf{x}-\mathbf{y})\end{align*}
The middle term is the Hessian matrix? 🤔

 
  • #8
mathmari said:
The middle term is the Hessian matrix?
Yep. (Nod)
 
  • #9
Klaas van Aarsen said:
Yep. (Nod)

Great!

How do we get from Taylor's theorem that
$$
\begin{aligned}
g(0) &\ge g(t) + g'(t)(-t)\\
g(1) &\ge g(t) + g'(t)(1-t)
\end{aligned}
$$
? Do we use this formula fro $k=2$ ? :unsure:
 
  • #10
mathmari said:
Do we use this formula fro $k=2$ ? :unsure:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐
 
  • #11
Klaas van Aarsen said:
Yep. (Nod)
Well, actually it's for $k=1$ and the remainder $R_k$ refers to $(k+1)=2$. 🧐

So we have
$$g(t)=g(0)+g'(0)(t-0)+\frac{g''(\xi)}{2}(t-0)^2 \Rightarrow \frac{g''(\xi)}{2}(t-0)^2=g(t)-g(0)-g'(0)(t-0)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-0)^2\geq 0 \Rightarrow g(t)-g(0)-g'(0)(t-0)\geq0 $$

$$g(t)=g(1)+g'(1)(t-1)+\frac{g''(\xi)}{2}(t-1)^2 \Rightarrow \frac{g''(\xi)}{2}(t-1)^2=g(t)-g(1)-g'(1)(t-1)$$ Since $g''(x)\geq 0$ for every $x$ we get $$\frac{g''(\xi)}{2}(t-1)^2\geq 0 \Rightarrow g(t)-g(1)-g'(1)(t-1)\geq0 $$

Is that correct so far? :unsure:
 
  • #12
mathmari said:
Is that correct so far?
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔
 
  • #13
Klaas van Aarsen said:
It is correct, but we won't be able to eliminate both $g'(0)$ and $g'(1)$ like that.
Instead we should pick a different expansion. 🤔

You mean at a different point? 🤔
 
  • #14
mathmari said:
You mean at a different point?
Yep. (Nod)
 
  • #15
Klaas van Aarsen said:
Yep. (Nod)

I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values? :unsure:

 
  • #16
mathmari said:
I don't really see what point we could take so that $g'(t)$ is still in the inequality.
We would get the inequality $g(0) \ge g(t) + g'(t)(-t)$ if we take at the formula of Wikipedia $a=t$ and $x=0$. But can we take these values?
We are free to choose any values that we want.
So yes, we can take those values. :geek:
 
  • #17
Klaas van Aarsen said:
We are free to choose any values that we want.
So yes, we can take those values. :geek:

So that means that we expand at $a=t$ ? :unsure:
 
  • #18
mathmari said:
So that means that we expand at $a=t$?
Let's try. :unsure:
 
  • #19
Klaas van Aarsen said:
Let's try. :unsure:

We have then that $$g(x)=g(t)+g'(t)(x-t)+\frac{g''(\xi)}{2}(x-t)^2\Rightarrow \frac{g''(\xi)}{2}(x-t)^2=g(x)-g(t)-g'(t)(x-t)\geq 0$$ For $x=0$ we get $$g(0)-g(t)-g'(t)(0-t)\geq 0$$ For $x=1$ we get $$g(1)-g(t)-g'(t)(1-t)\geq 0$$

Is that correct? :unsure:
 
  • #20
Looks good. (Nod)
 
  • #21
Klaas van Aarsen said:
Looks good. (Nod)

I think this direction is now clear to me! 🤩

For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ? :unsure:
 
  • #22
mathmari said:
For the other direction, we suppose that $f$ is convex and we want to show that the Hessian matrix is positiv semidefinite.
Do we need in this direction again the function $g$ ?

Do you have a hint for the other direction? 🤔
 
  • #23
Klaas van Aarsen said:
Do you have a hint for the other direction? 🤔

I found a proof online:

1621666181354.png
So doing it in the way as above, we needto take the $g$ as before, or not? :unsure:
 
  • #24
mathmari said:
So doing it in the way as above, we needto take the $g$ as before, or not?

I don't think so.
The $g(t)=f(tx+(1-t)y)$ that we had, matches $f(x+t(y-x)$ in the $H \succcurlyeq 0 \implies \text{convex}$ direction, which is used in a Taylor expansion.
But for the converse we need a different $g$. 🤔
 
  • #25
I found in some other notes a proof using the function $g(t)=f(\mathbf{x}+t\mathbf{v})$ and now it is clear! :)
 

FAQ: F convex iff Hessian matrix positive semidefinite

What is the definition of a convex function?

A function is considered convex if the line segment connecting any two points on the graph of the function always lies above or on the graph itself.

What does it mean for a function to be F convex?

A function is F convex if it satisfies the definition of convexity and also has a positive second derivative, meaning its graph is always curving upwards.

What is the Hessian matrix of a function?

The Hessian matrix is a square matrix of second-order partial derivatives of a function. It is used to determine the curvature of the function's graph at a given point.

How do you determine if a Hessian matrix is positive semidefinite?

A Hessian matrix is positive semidefinite if all of its eigenvalues are non-negative. This can be determined by calculating the eigenvalues using linear algebra techniques.

Why is the condition "F convex iff Hessian matrix positive semidefinite" important?

This condition is important because it provides a necessary and sufficient condition for a function to be convex. It also allows for the use of efficient optimization algorithms to find the minimum of a convex function.

Similar threads

Back
Top