Proving Injectivity of a Vector-Valued Function Using the Mean Value Theorem

  • MHB
  • Thread starter mathmari
  • Start date
  • Tags
    Injective
In summary, the content discusses proving a criterion for global invertibility using the mean value theorem for differential calculus in $\mathbb{R}^n$. The criterion states that if a function $f$ is continuously differentiable and has a non-singular Jacobian matrix at all points in a convex region $G\subset\mathbb{R}^n$, then $f$ is injective. The conversation explores the proof of this criterion using the mean value theorem, showing that if $f$ is not injective, then there exist points $a, b\in G$ such that $f(a)=f(b)$, leading to the conclusion that the determinant of the matrix formed by the partial derivatives of $f$ at those
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

I want to prove the following criteroin using the mean value theorem for differential calculus in $\mathbb{R}^n$:

Let $G\subset \mathbb{R}^n$ a convex region, $f:G\rightarrow \mathbb{R}^n$ continuously differentiable and it holds that \begin{equation*}\det \begin{pmatrix}\frac{\partial{f_1}}{\partial{x_1}}(c_1) & \ldots & \frac{\partial{f_1}}{\partial{x_n}}(c_1)\\ \vdots & \vdots & \vdots \\ \frac{\partial{f_n}}{\partial{x_1}}(c_n) & \ldots & \frac{\partial{f_n}}{\partial{x_n}}(c_n)\end{pmatrix}\neq 0 \ \text{ for all } c_1, c_2, \ldots , c_n\in G\end{equation*} Then $f$ is injective. I have done the following:

We assume that there are $a,b\in G$ with $f(a)=f(b)$.
From the mean value theorem for vector-valued functions, as $f$ is, it holds that \begin{align*}&f(b)-f(a)=(b-a)\int_0^1J_f(a+t(b-a))dt\ \\ & \overset{f(a)=f(b)}{\Longrightarrow} \ (b-a)\int_0^1J_f(a+t(b-a))dt=0\\ & \overset{a\neq b}{\Longrightarrow} \ \int_0^1J_f(a+t(b-a))dt=0\end{align*}

Since $G$ is convex and $a,b\in G$ it follows that $a+t(b-a)\in G$. This implies that $J_f(a+t(b-a))\neq0$.

Is everything correct so far? (Wondering)

How can we conclude from that that it is not possible that $\int_0^1J_f(a+t(b-a))dt=0$ ? (Wondering)
 
Physics news on Phys.org
  • #2
mathmari said:
Hey! :eek:

I want to prove the following criteroin using the mean value theorem for differential calculus in $\mathbb{R}^n$:

Let $G\subset \mathbb{R}^n$ a convex region, $f:G\rightarrow \mathbb{R}^n$ continuously differentiable and it holds that \begin{equation*}\det \begin{pmatrix}\frac{\partial{f_1}}{\partial{x_1}}(c_1) & \ldots & \frac{\partial{f_1}}{\partial{x_n}}(c_1)\\ \vdots & \vdots & \vdots \\ \frac{\partial{f_n}}{\partial{x_1}}(c_n) & \ldots & \frac{\partial{f_n}}{\partial{x_n}}(c_n)\end{pmatrix}\neq 0 \ \text{ for all } c_1, c_2, \ldots , c_n\in G\end{equation*} Then $f$ is injective. I have done the following:

We assume that there are $a,b\in G$ with $f(a)=f(b)$.
From the mean value theorem for vector-valued functions, as $f$ is, it holds that \begin{align*}&f(b)-f(a)=(b-a)\int_0^1J_f(a+t(b-a))dt\ \\ & \overset{f(a)=f(b)}{\Longrightarrow} \ (b-a)\int_0^1J_f(a+t(b-a))dt=0\\ & \overset{a\neq b}{\Longrightarrow} \ \int_0^1J_f(a+t(b-a))dt=0\end{align*}

Since $G$ is convex and $a,b\in G$ it follows that $a+t(b-a)\in G$. This implies that $J_f(a+t(b-a))\neq0$.

Is everything correct so far? (Wondering)

How can we conclude from that that it is not possible that $\int_0^1J_f(a+t(b-a))dt=0$ ? (Wondering)
In general one can only show local injectivity. For example, consider the map $f:\mathbf R^2\to \mathbf R^2$ defined as $f(x, y)=(e^y\cos(x), e^y\sin(x))$. Then $f$ has non-singular derivative everywhere but $f$ is not an injective map.

Local injectivity follows from the inverse function theorem, but of course, one can establish this ab initio.
 
  • #3
caffeinemachine said:
In general one can only show local injectivity. For example, consider the map $f:\mathbf R^2\to \mathbf R^2$ defined as $f(x, y)=(e^y\cos(x), e^y\sin(x))$. Then $f$ has non-singular derivative everywhere but $f$ is not an injective map.

Local injectivity follows from the inverse function theorem, but of course, one can establish this ab initio.

At the exercise statement it says that that criterion of global invertibility has to be proved using the mean value theorem of differential calculus. So, is the word "global" here wrong?

Is the way I proved that criterion completely wrong? What do I have to do then? Could you give me a hint?

(Wondering)
 
  • #4
mathmari said:
At the exercise statement it says that that criterion of global invertibility has to be proved using the mean value theorem of differential calculus. So, is the word "global" here wrong?

Is the way I proved that criterion completely wrong? What do I have to do then? Could you give me a hint?

(Wondering)
I actually misread the problem. The matrix you have in the OP has its rows with partial derivatives evaluated at various points of $G$. So it's not the Jacobian matrix of $f$ at any point.

Given the hypothesis of the problem, global injectivity is easy. Suppose $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then for each component $f_i$ of $f$, we have a point $c_i$ on the line joining $a$ and $b$ such that $Df_i|c_i(b-a)=0$.

(This is because of the mean value theorem in one variable. Basically, we look at the real-valued function obtained by restricting $f_i$ along the line joining $a$ and $b$. The ordinary MVT says that there is a point between $a$ and $b$ where the directional derivative of $f_i$ along $b-a$ is $0$).

Thus we have found point $c_1, \ldots, c_n\in G$ such that the matrix that you have in the OP evaluates $b-a$ to $0$, contradicting the non-singularity assumption.
 
  • #5
caffeinemachine said:
Given the hypothesis of the problem, global injectivity is easy. Suppose $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then for each component $f_i$ of $f$, we have a point $c_i$ on the line joining $a$ and $b$ such that $Df_i|c_i(b-a)=0$.

(This is because of the mean value theorem in one variable. Basically, we look at the real-valued function obtained by restricting $f_i$ along the line joining $a$ and $b$. The ordinary MVT says that there is a point between $a$ and $b$ where the directional derivative of $f_i$ along $b-a$ is $0$).

Thus we have found point $c_1, \ldots, c_n\in G$ such that the matrix that you have in the OP evaluates $b-a$ to $0$, contradicting the non-singularity assumption.

We assume that $f$ is not injective, i.e. that $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then from the MVT for each component $f_i$ of $f$ we have that $$f_i(b)-f_i(a)=Df_i|c_i(b-a)$$ right? (Wondering)

Since $f(a)=f(b)$ it follows that $f_i(a)=f_i(b)$ for each $i$. That implies that $Df_i|c_i(b-a)=0 \ \overset{a\neq b}{\Longrightarrow} \
Df_i|c_i=0, \forall i$.

We consider a matrix where at each row $i$ we have the $Df_i|c_i$. Do we get in that way the matrix as in the initial post? (Wondering)

Since $Df_i|c_i=0$ for each $i$, we get the zero matrix and so the determinant of that matrix will also be equal to $0$, a contradiction.

So, the assumption is wrong and therefore $f$ is injective. Have I understood the proof correctly? (Wondering)
 
Last edited by a moderator:
  • #6
mathmari said:
Since $f(a)=f(b)$ it follows that $f_i(a)=f_i(b)$ for each $i$. That implies that $Df_i|c_i(b-a)=0 \ \overset{a\neq b}{\Longrightarrow} \
Df_i|c_i=0, \forall i$.
The last implication is incorrect. If a lineat map $\mathbf R^n\to \mathbf R$ evalutes to zero at a nonzero vector, that does not mean that the linear map is zero. It's just that it has a non trivial kernel.

mathmari said:
We consider a matrix where at each row $i$ we have the $Df_i|c_i$. Do we get in that way the matrix as in the initial post? (Wondering)
Yes. If you think of $Df_i|_{c_i}$ as a vector, then the $j$-th component of this vector is $(\partial f_i/\partial x_j)|_{c_i}$.

mathmari said:
Since $Df_i|c_i=0$ for each $i$, we get the zero matrix and so the determinant of that matrix will also be equal to $0$, a contradiction.

We don't get the zero-matrix. We just get a non-singular matrix, since this matrix sends $b-a$ to $0$.
 
  • #7
caffeinemachine said:
The last implication is incorrect. If a lineat map $\mathbf R^n\to \mathbf R$ evalutes to zero at a nonzero vector, that does not mean that the linear map is zero. It's just that it has a non trivial kernel.

So, it is $Df_i|_{c_i}$ at the point $b-a$ and not $Df_i|_{c_i}$ multiplied by $(b-a)$ ? I had misunderstood that.

Which is the general formula of the MTV in this case? Isn't it the difference of the function of at two points $a,b$, divided by the difference of $a,b$ and this is equal to the derivative of $f$ at a point between $a$ and $b$ ? (Wondering)
 
  • #8
mathmari said:
So, it is $Df_i|_{c_i}$ at the point $b-a$ and not $Df_i|_{c_i}$ multiplied by $(b-a)$ ? I had misunderstood that.

Which is the general formula of the MTV in this case? Isn't it the difference of the function of at two points $a,b$, divided by the difference of $a,b$ and this is equal to the derivative of $f$ at a point between $a$ and $b$ ? (Wondering)

$Df_i|_{c_i}$ is a linear map from $\mathbf R^n$ to $\mathbf R$. Its value at the point $b-a$ is $0$. When we have a linear map $T:\mathbf R^n\to \mathbf R$, and we have a vector $v\in \mathbf R^n$, what phrase do we use to refer to $Tv$? Do we say "$T$ multiplied by $v$" or do we say "$T$ at $v$"? I actually do not know. But "multiplied by" would not be my choice of terminology.

Assuming the one variabel MVT, define $g_i:\mathbf R\to \mathbf R$ as $g_i(t)=f_i(a+t(b-a))$. Then $g_i(0)=g_i(1)$. Thus there is $t_i\in (0, 1)$ such that $g_i'(t_i)=0$. Therefore $Df_i|_{a+t_i(b-a)}(b-a) = 0$. Write $c_i$ to denote $a+t_i(b-a)$.

Does this make things clear?
 
  • #9
caffeinemachine said:
$Df_i|_{c_i}$ is a linear map from $\mathbf R^n$ to $\mathbf R$. Its value at the point $b-a$ is $0$. When we have a linear map $T:\mathbf R^n\to \mathbf R$, and we have a vector $v\in \mathbf R^n$, what phrase do we use to refer to $Tv$? Do we say "$T$ multiplied by $v$" or do we say "$T$ at $v$"? I actually do not know. But "multiplied by" would not be my choice of terminology.

Assuming the one variabel MVT, define $g_i:\mathbf R\to \mathbf R$ as $g_i(t)=f_i(a+t(b-a))$. Then $g_i(0)=g_i(1)$. Thus there is $t_i\in (0, 1)$ such that $g_i'(t_i)=0$. Therefore $Df_i|_{a+t_i(b-a)}(b-a) = 0$. Write $c_i$ to denote $a+t_i(b-a)$.

Does this make things clear?
So, $Df_i|_{a+t_i(b-a)}(b-a)$ is the dot product of the gradient $Df_i|_{a+t_i(b-a)}$ and the vector $(b-a)$. Or am I still thinking wrong? (Wondering)

Because, isn't it as follows?

$$g_i'(t_i)=\frac{\partial}{\partial{t_i}}f_i(a+t(b-a))=\frac{\partial{f_i}}{\partial{x_i}}\cdot \frac{\partial{(a+t(b-a))_i}}{\partial{t_i}}$$
 
  • #10
mathmari said:
So, $Df_i|_{a+t_i(b-a)}(b-a)$ is the dot product of the gradient $Df_i|_{a+t_i(b-a)}$ and the vector $(b-a)$. Or am I still thinking wrong? (Wondering)

Because, isn't it as follows?

$$g_i'(t_i)=\frac{\partial}{\partial{t_i}}f_i(a+t(b-a))=\frac{\partial{f_i}}{\partial{x_i}}\cdot \frac{\partial{(a+t(b-a))_i}}{\partial{t_i}}$$

No it should be
$$g_i'(t_i)=\left.\frac{d}{dt}f_i(a+t(b-a))\right|_{t_i}= Df_i|_{a+t_i(b-a)}(b-a)$$

The last term is same as

$$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$
 
  • #11
caffeinemachine said:
No it should be
$$g_i'(t_i)=\left.\frac{d}{dt}f_i(a+t(b-a))\right|_{t_i}= Df_i|_{a+t_i(b-a)}(b-a)$$

The last term is same as

$$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$

But at $$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$ isn't $(b_j-a_j)$ multiplied with the derivative? It is not that the derivative is evaluated at $(b_j-a_j)$, is it?

And so at $Df_i|_{a+t_i(b-a)}(b-a)$ we have the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not?

(Wondering)
 
  • #12
mathmari said:
But at $$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$ isn't $(b_j-a_j)$ multiplied with the derivative? It is not that the derivative is evaluated at $(b_j-a_j)$, is it?
Indeed, $b_i-a_i$ is multiplied with the $i$-th partial derivative in the above expression.

mathmari said:
And so at $Df_i|_{a+t_i(b-a)}(b-a)$ we have the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not?

I suppose your question was " Is $Df_i|_{a+t_i(b-a)}(b-a)$ same as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not? "

Well, strictly speaking, I'd say no. $Df_i|_{a+t_i(b-a)}$ is a linear map $\mathbf R^n\to \mathbf R$ and $b-a$ is a vector in $\mathbf R^n$. One cannot take the dot product of a linear operator with a vector in its domain. But since $\mathbf R^n$ has a standard inner product, $Df_i|_{a+t_i(b-a)}$ can be thought of as a vector.
Once this identification is made, onc can think of $Df_i|_{a+t_i(b-a)}(b-a)$ as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$.
 
  • #13
caffeinemachine said:
Indeed, $b_i-a_i$ is multiplied with the $i$-th partial derivative in the above expression.
I suppose your question was " Is $Df_i|_{a+t_i(b-a)}(b-a)$ same as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not? "

Well, strictly speaking, I'd say no. $Df_i|_{a+t_i(b-a)}$ is a linear map $\mathbf R^n\to \mathbf R$ and $b-a$ is a vector in $\mathbf R^n$. One cannot take the dot product of a linear operator with a vector in its domain. But since $\mathbf R^n$ has a standard inner product, $Df_i|_{a+t_i(b-a)}$ can be thought of as a vector.
Once this identification is made, onc can think of $Df_i|_{a+t_i(b-a)}(b-a)$ as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$.

Ah ok! Thanks a lot! (Smile)
 

FAQ: Proving Injectivity of a Vector-Valued Function Using the Mean Value Theorem

What does it mean for a function to be injective?

A function is injective if each element in the domain maps to a unique element in the range. In other words, no two elements in the domain can map to the same element in the range.

How do you show that a function is injective?

To show that a function is injective, you must prove that for any two distinct elements in the domain, their corresponding outputs in the range are also distinct.

What is the importance of proving that a function is injective?

Proving that a function is injective is important because it guarantees that every element in the range has a unique preimage in the domain. This allows for the inverse of the function to exist, making it easier to solve equations and perform other mathematical operations.

Can a function be both injective and surjective?

Yes, a function can be both injective and surjective. This means that every element in the range has a unique preimage in the domain, and every element in the range is mapped to by at least one element in the domain.

Are there any visual representations of injective functions?

Yes, there are visual representations of injective functions. One way to represent an injective function is through a one-to-one mapping diagram, where each element in the domain is connected to a unique element in the range.

Similar threads

Replies
24
Views
3K
Replies
12
Views
3K
Replies
3
Views
2K
Replies
26
Views
4K
Replies
2
Views
2K
Replies
1
Views
2K
Replies
5
Views
1K
Back
Top