Can we simplify the relations of condition number for orthogonal matrices?

In summary, the condition number of a matrix is a measure of its sensitivity to errors or perturbations in the input data, calculated as the ratio of the largest to smallest singular value. A smaller condition number is indicative of a more accurate solution, while a larger condition number can lead to inaccurate solutions and numerical instability. The condition number can be improved by using more stable numerical methods, preconditioners, or higher precision arithmetic. There is no universal threshold for a good condition number, as it depends on the specific problem and desired level of accuracy.
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

I want to prove the following relations of condition number:
  1. $\operatorname{cond}(\alpha A)=\operatorname{cond}(A)$. The matrixnorm is submultiplicativ.
  2. $\operatorname{cond}_2(U)=1$ if $U$ is an orthogonal matrix.
  3. $\operatorname{cond}_2(UA)=\operatorname{cond}_2(A)$, $U$ is orthogonal.
  4. $\operatorname{cond}_2(A)\leq \operatorname{cond}_F(A)\leq n\operatorname{cond}_{\infty}(A)$.
I have done the following :

  1. I have proven this equality, but I wanted to ask why it is given in this case that the matrix norm is submultiplicativ, we don't need this here, do we?
  2. Since $U$ is an orthogonal matrix, we have that $U^{-1}=U^T$.

    So, we get $\operatorname{cond}_2( U)=\|U\|_2\,\| U^{-1} \|_2=\| U\|_2\,\| U^T \|_2$. How could we continue?
  3. We have that:

    $\operatorname{cond}_2(UA)=\|UA\|\|(UA)^{-1}\|=\|UA\|\|A^{-1}U^{-1}\|=\|UA\|\|A^{-1}U^T\|$

    We have to use here that $\operatorname{cond}_2(U)=1$, right? But how exactly?
  4. Could you give me a hint how we could show these inequalities?

(Wondering)
 
Mathematics news on Phys.org
  • #2
mathmari said:
Hey! :eek:[*] $\operatorname{cond}(\alpha A)=\operatorname{cond}(A)$. The matrixnorm is submultiplicativ.

I have proven this equality, but I wanted to ask why it is given in this case that the matrix norm is submultiplicativ, we don't need this here, do we?

Hey mathmari! (Smile)

I agree that we do not need the norm to be submultiplicative.
Btw, we do need that $\alpha\ne 0$ don't we?

mathmari said:
[*] $\operatorname{cond}_2(U)=1$ if $U$ is an orthogonal matrix.

Since $U$ is an orthogonal matrix, we have that $U^{-1}=U^T$.

So, we get $\operatorname{cond}_2( U)=\|U\|_2\,\| U^{-1} \|_2=\| U\|_2\,\| U^T \|_2$. How could we continue?

This is specifically about the matrix 2-norm, which is defined as $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}$ where $\|x\|_2$ is the standard Euclidean norm.
What does that mean for the norm if the matrix is orthogonal? (Wondering)

mathmari said:
[*] $\operatorname{cond}_2(UA)=\operatorname{cond}_2(A)$, $U$ is orthogonal.

We have that:

$\operatorname{cond}_2(UA)=\|UA\|\|(UA)^{-1}\|=\|UA\|\|A^{-1}U^{-1}\|=\|UA\|\|A^{-1}U^T\|$

We have to use here that $\operatorname{cond}_2(U)=1$, right? But how exactly?

Same thing. I believe we have to use the definition of the matrix 2-norm and a property of an orthogonal matrix. (Thinking)

mathmari said:
[*] $\operatorname{cond}_2(A)\leq \operatorname{cond}_F(A)\leq n\operatorname{cond}_{\infty}(A)$.

Could you give me a hint how we could show these inequalities?

How about starting with the definitions of those norms and their properties? (Wondering)
 
  • #3
I like Serena said:
I agree that we do not need the norm to be submulitplicative.
Btw, we do need that $\alpha\ne 0$ don't we?

Yes, this is given that $\alpha\in \mathbb{K}\setminus\{0\}$. (Yes)
I like Serena said:
This is specifically about the matrix 2-norm, which is defined as $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}$ where $\|x\|_2$ is the standard Euclidean norm.
What does that mean for the norm if the matrix is orthogonal? (Wondering)

We have that $\|Ax\|_2=\sqrt{(Ax,Ax)}=(Ax)^T(Ax)=A^Tx^TAx=A^TAx^Tx=A^{-1}Ax^Tx=x^Tx=(x,x)=\|x\|_2$.

So, we get that $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|x\|_2}{\|x\|_2}=\sup_{x\ne 0}1=1$.

We also have that $\|A^Tx\|_2=\sqrt{(A^Tx,A^Tx)}=(A^T)^T(A^Tx)=Ax^TA^Tx=AA^Tx^Tx=AA^{-1}x^Tx=x^Tx=(x,x)=\|x\|_2$.

So, we get that $\|A^T\|_2=\sup_{x\ne 0}\frac{\|A^Tx\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|x\|_2}{\|x\|_2}=\sup_{x\ne 0}1=1$. So, we get $\operatorname{cond}_2( U)=\|U\|_2\,\| U^{-1} \|_2=\| U\|_2\,\| U^T \|_2=1$.
Is evertything correct? (Wondering)
I like Serena said:
Same thing. I believe we have to use the definition of the matrix 2-norm and a property of an orthogonal matrix. (Thinking)

We have that $\|UA\|_2=\sup_{x\ne 0}\frac{\|(UA)x\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|U(Ax)\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}=\|A\|_2$ and $\|A^{-1}U^{T}\|_2=\sup_{x\ne 0}\frac{\|(A^{-1}U^T)x\|_2}{\|x\|_2}$

Does it hold that $A^{-1}U^T=U^TA^{-1}$ ? (Wondering)
I like Serena said:
How about starting with the definitions of those norms and their properties? (Wondering)

We have that $\operatorname{cond}_2(A)=\|A\|_2\,\|A^{-1}\|_2$, $\operatorname{cond}_F(A)=\|A\|_F\,\|A^{-1}\|_F$, $ \operatorname{cond}_{\infty}(A)=\|A\|_{\infty}\,\|A^{-1}\|_{\infty}$.

We have that $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}$, $\|A\|_F=\sup_{x\ne 0}\frac{\|Ax\|_F}{\|x\|_F}$, $\|A\|_{\infty}=\sup_{x\ne 0}\frac{\|Ax\|_{\infty}}{\|x\|_{\infty}}$.

Since $\displaystyle{(Ax)_i=\sum_{j=1}^na_{i,j}x_j}$, we get $\displaystyle{\|Ax\|_2^2=\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}x_j|\right )^2=\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}||x_j|\right )^2}$
By the Cauchy-Schwarz inequality we get $\displaystyle{\left (\sum_{j=1}^n|a_{i,j}||x_j|\right )^2\leq \left (\sum_{j=1}^n{|a_{i,j}|^2}\right )\left (\sum_{j=1}^n|x_j|^2\right )=\left (\sum_{j=1}^n{|a_{i,j}|^2}\right )\|x\|_2^2}$
So, we get \begin{equation*}\|Ax\|_2^2=\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}||x_j|\right )^2\leq \sum_{i=1}^n\sum_{j=1}^n|a_{i,j}|^2\|x\|_2^2=\|A\|_F^2\,\|x\|_2^2 \end{equation*}

So, we have that $\|Ax\|_2\leq \|A\|_F\,\|x\|_2\Rightarrow \frac{\|Ax\|_2}{\|x\|_2}\leq \|A\|_F\Rightarrow \sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}\leq \|A\|_F$, or not? (Wondering)

Since $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}$ it folows that $\|A\|_2\leq \|A\|_F$.

Equivalently, we get that $\|A^{-1}\|_2\leq \|A^{-1}\|_F$

So we get $\operatorname{cond}_2(A)=\|A\|_2\,\|A^{-1}\|_2\leq \|A\|_F\,\|A^{-1}\|_F=\operatorname{cond}_F(A)$. Is everything correct? (Wondering)

How do we get at the inequality $\operatorname{cond}_F(A)\leq n\operatorname{cond}_{\infty}(A)$ the term $n$ ? (Wondering)
 
  • #4
mathmari said:
We have that $\|Ax\|_2=\sqrt{(Ax,Ax)}=(Ax)^T(Ax)=A^Tx^TAx=A^TAx^Tx=A^{-1}Ax^Tx=x^Tx=(x,x)=\|x\|_2$.

I'm afraid that $(Ax)^T$ is not equal to $A^Tx^T$, nor can we swap $x^TA$ around. (Worried)

Instead we have $(Ax)^T=x^TA^T$ after which we don't have to swap any more. (Happy)

mathmari said:
So, we get that $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|x\|_2}{\|x\|_2}=\sup_{x\ne 0}1=1$.

We also have that $\|A^Tx\|_2=\sqrt{(A^Tx,A^Tx)}=(A^T)^T(A^Tx)=Ax^TA^Tx=AA^Tx^Tx=AA^{-1}x^Tx=x^Tx=(x,x)=\|x\|_2$.

So, we get that $\|A^T\|_2=\sup_{x\ne 0}\frac{\|A^Tx\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|x\|_2}{\|x\|_2}=\sup_{x\ne 0}1=1$. So, we get $\operatorname{cond}_2( U)=\|U\|_2\,\| U^{-1} \|_2=\| U\|_2\,\| U^T \|_2=1$.

Is evertything correct? (Wondering)

Well... there's a typo in the word 'evertything'... (Worried)
And there's another transpose-swap problem for $\|A^Tx\|_2$.
Other than that it's correct!

mathmari said:
We have that $\|UA\|_2=\sup_{x\ne 0}\frac{\|(UA)x\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|U(Ax)\|_2}{\|x\|_2}=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}=\|A\|_2$ and $\|A^{-1}U^{T}\|_2=\sup_{x\ne 0}\frac{\|(A^{-1}U^T)x\|_2}{\|x\|_2}$

Does it hold that $A^{-1}U^T=U^TA^{-1}$ ?

Nope.
Can't we write instead that $\|A^{-1}U^{-1}\|_2=\sup_{x\ne 0}\frac{\|(A^{-1}U^{-1})x\|_2}{\|x\|_2}
=\sup_{x\ne 0}\frac{\|A^{-1}(U^{-1}x)\|_2}{\|U^{-1}x\|_2}
=\sup_{y\ne 0}\frac{\|A^{-1}y\|_2}{\|y\|_2}
$? (Wondering)

mathmari said:
We have that $\operatorname{cond}_2(A)=\|A\|_2\,\|A^{-1}\|_2$, $\operatorname{cond}_F(A)=\|A\|_F\,\|A^{-1}\|_F$, $ \operatorname{cond}_{\infty}(A)=\|A\|_{\infty}\,\|A^{-1}\|_{\infty}$.

We have that $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}$, $\|A\|_F=\sup_{x\ne 0}\frac{\|Ax\|_F}{\|x\|_F}$, $\|A\|_{\infty}=\sup_{x\ne 0}\frac{\|Ax\|_{\infty}}{\|x\|_{\infty}}$.

Since $\displaystyle{(Ax)_i=\sum_{j=1}^na_{i,j}x_j}$, we get $\displaystyle{\|Ax\|_2^2=\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}x_j|\right )^2=\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}||x_j|\right )^2}$
By the Cauchy-Schwarz inequality we get $\displaystyle{\left (\sum_{j=1}^n|a_{i,j}||x_j|\right )^2\leq \left (\sum_{j=1}^n{|a_{i,j}|^2}\right )\left (\sum_{j=1}^n|x_j|^2\right )=\left (\sum_{j=1}^n{|a_{i,j}|^2}\right )\|x\|_2^2}$
So, we get \begin{equation*}\|Ax\|_2^2=\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}||x_j|\right )^2\leq \sum_{i=1}^n\sum_{j=1}^n|a_{i,j}|^2\|x\|_2^2=\|A\|_F^2\,\|x\|_2^2 \end{equation*}

So, we have that $\|Ax\|_2\leq \|A\|_F\,\|x\|_2\Rightarrow \frac{\|Ax\|_2}{\|x\|_2}\leq \|A\|_F\Rightarrow \sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}\leq \|A\|_F$, or not? (Wondering)

Since $\|A\|_2=\sup_{x\ne 0}\frac{\|Ax\|_2}{\|x\|_2}$ it folows that $\|A\|_2\leq \|A\|_F$.

Equivalently, we get that $\|A^{-1}\|_2\leq \|A^{-1}\|_F$

So we get $\operatorname{cond}_2(A)=\|A\|_2\,\|A^{-1}\|_2\leq \|A\|_F\,\|A^{-1}\|_F=\operatorname{cond}_F(A)$. Is everything correct?

Yep. Looks good.

mathmari said:
How do we get at the inequality $\operatorname{cond}_F(A)\leq n\operatorname{cond}_{\infty}(A)$ the term $n$ ?

I'm not sure yet how to get the inequality, but that $n$ should be due to the fact that the Frobenius norm is the squared sum of all elements in all rows, while the maximum norm is represented by only 1 row. (Thinking)
 
  • #5
I like Serena said:
I'm afraid that $(Ax)^T$ is not equal to $A^Tx^T$, nor can we swap $x^TA$ around. (Worried)

Instead we have $(Ax)^T=x^TA^T$ after which we don't have to swap any more. (Happy)

So, we have the following:
\begin{equation*}\|Ax\|_2=\sqrt{(Ax,Ax)}=(Ax)^T(Ax)=x^TA^TAx=x^TA^{-1}Ax=x^Tx=(x,x)=\|x\|_2\end{equation*}

(Nerd)
I like Serena said:
And there's another transpose-swap problem for $\|A^Tx\|_2$.
Other than that it's correct!

We have that \begin{equation*}\|A^Tx\|_2=\sqrt{(A^Tx,A^Tx)}=(A^Tx)^T(A^Tx)=x^TAA^Tx=x^TAA^{-1}x=x^Tx=(x,x)=\|x\|_2\end{equation*}

(Nerd)
I like Serena said:
Nope.
Can't we write instead that $\|A^{-1}U^{-1}\|_2=\sup_{x\ne 0}\frac{\|(A^{-1}U^{-1})x\|_2}{\|x\|_2}
=\sup_{x\ne 0}\frac{\|A^{-1}(U^{-1}x)\|_2}{\|U^{-1}x\|_2}
=\sup_{y\ne 0}\frac{\|A^{-1}y\|_2}{\|y\|_2}
$? (Wondering)

How do we get the equality $\sup_{x\ne 0}\frac{\|(A^{-1}U^{-1})x\|_2}{\|x\|_2}
=\sup_{x\ne 0}\frac{\|A^{-1}(U^{-1}x)\|_2}{\|U^{-1}x\|_2}$ ? (Wondering)
I like Serena said:
I'm not sure yet how to get the inequality, but that $n$ should be due to the fact that the Frobenius norm is the squared sum of all elements in all rows, while the maximum norm is represented by only 1 row. (Thinking)

Does it maybe hold the following?
\begin{equation*}\|A\|_F=\sqrt{\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}|^2\right )}\leq \sqrt{n\cdot \max_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}|^2\right )}=\sqrt{n\cdot \|A\|_{\infty}}\end{equation*}

(Wondering)
 
  • #6
mathmari said:
So, we have the following:
\begin{equation*}\|Ax\|_2=\sqrt{(Ax,Ax)}=(Ax)^T(Ax)=x^TA^TAx=x^TA^{-1}Ax=x^Tx=(x,x)=\|x\|_2\end{equation*}

(Nerd)

We have that \begin{equation*}\|A^Tx\|_2=\sqrt{(A^Tx,A^Tx)}=(A^Tx)^T(A^Tx)=x^TAA^Tx=x^TAA^{-1}x=x^Tx=(x,x)=\|x\|_2\end{equation*}

(Nerd)

Yep. (Nod)

mathmari said:
How do we get the equality $\sup_{x\ne 0}\frac{\|(A^{-1}U^{-1})x\|_2}{\|x\|_2}
=\sup_{x\ne 0}\frac{\|A^{-1}(U^{-1}x)\|_2}{\|U^{-1}x\|_2}$ ?

Since $U$ is an orthogonal matrix, $U^{-1}$ is also orthogonal isn't it?
And didn't we just find that $\|Ux\|=\|x\|$? (Wondering)

mathmari said:
Does it maybe hold the following?
\begin{equation*}\|A\|_F=\sqrt{\sum_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}|^2\right )}\leq \sqrt{n\cdot \max_{i=1}^n\left (\sum_{j=1}^n|a_{i,j}|^2\right )}=\sqrt{n\cdot \|A\|_{\infty}}\end{equation*}

Let $B=(b_{ij})=A^{-1}$.
Then shouldn't that be:
$$\DeclareMathOperator\cond{cond}
\cond_F A \overset ?\le n\cond_\infty A
\iff \|A\|_F\|A^{-1}\|_F \overset ?\le n\|A\|_\infty \|A^{-1}\|_\infty
\iff \sqrt{\sum_{i,j} |a_{ij}|^2}\sqrt{\sum_{i,j} |b_{ij}|^2} \overset ?\le n \max_i\sum_j|a_{ij}| \max_i\sum_j|b_{ij}|
$$
(Wondering)
 
Last edited:
  • #7
I like Serena said:
Since $U$ is an orthogonal matrix, $U^{-1}$ is also orthogonal isn't it?
And didn't we just find that $\|Ux\|=\|x\|$? (Wondering)

Ahh I see! (Yes)
I like Serena said:
Let $B=(b_{ij})=A^{-1}$.
Then shouldn't that be:
$$\DeclareMathOperator\cond{cond}
\cond_F A \overset ?\ge n\cond_\infty A
\iff \|A\|_F\|A^{-1}\|_F \overset ?\ge n\|A\|_\infty \|A^{-1}\|_\infty
\iff \sqrt{\sum_{i,j} |a_{ij}|^2}\sqrt{\sum_{i,j} |b_{ij}|^2} \overset ?\ge n \max_i\sum_j|a_{ij}| \max_i\sum_j|b_{ij}|
$$
(Wondering)

Do we not have to show the inequality with $\leq$ instead of $\geq$ ? I got stuck right now. (Wondering)
 
  • #8
mathmari said:
Do we not have to show the inequality with $\leq$ instead of $\geq$ ? I got stuck right now. (Wondering)

Yes, you are correct.
I've fixed the inequalities in my previous post.

So if we could prove that $\sqrt{\sum_{ij} |a_{ij}|^2} \le \sqrt n \max_i \sum_j |a_{ij}|
\iff \sum_{ij} |a_{ij}|^2 \le n \left(\!\max_i \sum_j |a_{ij}|\!\right)^{\!2}
$ for any matrix A, we would be done, wouldn't we? (Thinking)
 
  • #9
I like Serena said:
So if we could prove that $\sqrt{\sum_{ij} |a_{ij}|^2} \le \sqrt n \max_i \sum_j |a_{ij}|
\iff \sum_{ij} |a_{ij}|^2 \le n \left(\!\max_i \sum_j |a_{ij}|\!\right)^{\!2}
$ for any matrix A, we would be done, wouldn't we? (Thinking)

Do we have the following?
\begin{equation*}\sum_{i=1}^n\sum_{j=1}^n |a_{ij}|^2\overset{ (\star) }{ \leq } \sum_{i=1}^n\left (\sum_{j=1}^n |a_{ij}|\right )^2\overset{ (\star\star) }{ \leq } n\cdot \max_{i=1}^n\left (\sum_{j=1}^n |a_{ij}|\right )^2\end{equation*}

$(\star)$: It holds that $(a+b)^2=a^2+2ab+b^2\geq a^2+b^2$. By induction, we can show that $\displaystyle{\sum_{j=1}^n |a_{ij}|^2\leq \left (\sum_{j=1}^n |a_{ij}|\right )^2}$.

$(\star\star)$: The sum of $n$ terms is less than $n$ times the largest term.
 
  • #10
Yep. Looks good (for non-negative a and b)! (Nod)
 
  • #11
So, we have that
\begin{equation*}\|A\|_F^2=\sum_{i=1}^n\sum_{j=1}^n |a_{ij}|^2\leq n\cdot \max_{i=1}^n\left (\sum_{j=1}^n |a_{ij}|\right )^2\end{equation*}

We also have that $\displaystyle{\|A\|_{\infty}=\max_{i=1}^n \sum_{j=1}^n|a_{i,j}| }$.

Then $\displaystyle{\|A\|_{\infty}^2=\left (\max_{i=1}^n \sum_{j=1}^n|a_{i,j}| \right )^2}$. Does it hold that $\displaystyle{\left (\max_{i=1}^n \sum_{j=1}^n|a_{i,j}| \right )^2=\max_{i=1}^n\left (\sum_{j=1}^n |a_{ij}|\right )^2}$ ?
 
  • #12
Is $\max(a,b)^2 \overset ?= \max(a^2,b^2)$? (Wondering)
 
  • #13
I like Serena said:
Is $\max(a,b)^2 \overset ?= \max(a^2,b^2)$? (Wondering)

Let $a<b$ then $a^2<b^2$, i.e. $\max (a,b)=b$ and $\max (a^2, b^2)=b^2$.
That means that $\max(a,b)^2=b^2=\max(a^2,b^2)$. Thank you so much! (Happy)
 

FAQ: Can we simplify the relations of condition number for orthogonal matrices?

What is the condition number of a matrix?

The condition number of a matrix is a measure of its sensitivity to errors or perturbations in the input data. It is calculated as the ratio of the largest singular value to the smallest singular value of the matrix. A larger condition number indicates a more ill-conditioned matrix, meaning that small changes in the input data can result in large changes in the output.

How is the condition number related to the accuracy of a solution?

The condition number is inversely related to the accuracy of a solution. This means that a smaller condition number indicates a more accurate solution, as small changes in the input data will have a smaller impact on the output.

What is the significance of a high condition number?

A high condition number indicates that a matrix is ill-conditioned, meaning that small changes in the input data can result in large changes in the output. This can lead to inaccurate solutions and numerical instability in calculations.

How can the condition number be improved?

There are a few ways to improve the condition number of a matrix. One approach is to use a more stable numerical method, such as Gaussian elimination with partial pivoting. Another technique is to use a preconditioner, which transforms the matrix into one with a lower condition number. Additionally, using higher precision arithmetic can also improve the condition number.

Is there a universal threshold for a good condition number?

There is no universal threshold for a good condition number, as it depends on the specific problem and the desired level of accuracy. In general, a condition number below 10 is considered well-conditioned, while a condition number above 100 is considered highly ill-conditioned. However, these values are not set in stone and can vary depending on the problem at hand.

Similar threads

Replies
9
Views
3K
Replies
4
Views
1K
Replies
16
Views
4K
Replies
11
Views
2K
Replies
1
Views
2K
Back
Top