What is the proof for the residual sum of squares problem?

Jameson · Sep 3, 2013

Problem: Through transformation with orthogonal matrix $O$, the problem $\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y-Xb||^2$ is equivalent to $\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y^{*}-X^{*}b||^2$, where $y$ and $y^{*}$ are in $\mathbb{R}^m$, $X$ and $X^{*}$ are in $\mathbb{R}^{m \times n}$ ($m \ge n$) and $y^{*}=Oy$ and $X^{*}=OX$. Let $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$.

Prove that the residual sum of square $\displaystyle ||y-X \hat{b}||^2=\sum_{i=n+1}^{m}||y_i^{*}||^2$.

Solution: I must admit I am a bit overwhelmed by this problem. I believe that $\underset{b}{\operatorname{arg min}}$ just means "for the lowest value of $b$", correct? I think I should start by reading up on the concepts which are needed to solve this. Can someone highlight the main ideas I'll need to know to attack this?

I like Serena · Sep 3, 2013

Jameson said:

Problem: Through transformation with orthogonal matrix $O$, the problem $\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y-Xb||^2$ is equivalent to $\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y^{*}-X^{*}b||^2$, where $y$ and $y^{*}$ are in $\mathbb{R}^m$, $X$ and $X^{*}$ are in $\mathbb{R}^{m \times n}$ ($m \ge n$) and $y^{*}=Oy$ and $X^{*}=OX$. Let $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$.

Prove that the residual sum of square $\displaystyle ||y-X \hat{b}||^2=\sum_{i=n+1}^{m}||y_i^{*}||^2$.

Solution: I must admit I am a bit overwhelmed by this problem. I believe that $\underset{b}{\operatorname{arg min}}$ just means "for the lowest value of $b$", correct? I think I should start by reading up on the concepts which are needed to solve this. Can someone highlight the main ideas I'll need to know to attack this?

Hi Jameson!

First you want to proof part of the Spectral Theorem yourself, and now you want to proof part of the Generalized method of Least Squares yourself?
What kind of course are you following?

Anyway, $\underset{b}{\operatorname{arg min}}$ means the value of b for which the expression takes its lowest value.

Btw, there seems to be something missing from your problem statement.
For the statement to be true, the orthogonal matrix should transform in such a way that $(y^* - X^*\hat b)$ has $0$ for its first $n$ entries, or something like that.

Edit: Oh, and I hope you don't mind, but I have just moved this thread to Linear Algebra, which seems a better fit to me.

Jameson · Sep 3, 2013

The course is called "Computational methods" and it already seems to be way out of my league to be honest, but I'm hoping we'll switch over to coding more than theory. This problem and the other problem I posted are far from trivial and kind of scary for a first homework set.

Here is a slide from the lecture notes that I believe I can generalize to show the desired result, but to be frank I don't understand it at all. Looks like I have a lot of reading to do! :(

I like Serena · Sep 3, 2013

Well, as I see it, in any course I would suggest to first make sure you know what all the symbols and definitions mean. Then to try and apply the theorems that are given.
It sounds a bit as if for your purpose that is enough.
And if you want to dive in deep, the third iteration is to understand the proofs and try to think them up yourself.

The proofs that you ask for suggest that a bit of linear algebra is a prerequisite. It seems a bit out of scope for a course of computational methods to ask you to think up these proofs yourself from scratch.
What is probably more useful for you is to see how you can apply the formulas that are given.
Do you know for instance what they mean by finding a solution by back substitution?

Jameson · Sep 3, 2013

I agree with you completely about the levels and methods of understanding. I took linear algebra at this university in spring and we didn't cover a lot of this material, definitely not the Spectral Theorem so I feel professors aren't communicating very well at all.

The back substitution method is an iterative process for solving upper triangular matrices.

For the proof to make sense, I believe $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$ should be upper-triangular. If so that would be a good place to start but I'm not sure it must be. I need to first grasp the purpose of $y^{*}=Oy$ and $X^{*}=OX$.

EDIT: Hmm, $y^*$ is in $\mathbb{R}^m$, so it's a column vector. Don't think it can be upper-triangular then.

I like Serena · Sep 3, 2013

True, $y^*$ can't be upper-triangular. It is $X^*=OX$ that will be upper-triangular.

Let me try to illustrate with a choice for the variables that is as simple as possible.
Suppose we pick:
$$n=1, \quad m=2, \quad y=\begin{pmatrix}1\\3\end{pmatrix}, \quad X = \begin{pmatrix} 2 \\1\end{pmatrix}$$
The residue is the squared length of the vector between $y$ and the line $Xb$.
The residue takes on its lowest value for $\hat b = (1)$. (Yes, b is 1-dimensional.)

We can draw them like this:

View attachment 1244

The orthogonal matrix $O$ would be a 2x2 matrix that rotates everything such that 1 of the coordinates becomes zero. At that time the residue is aligned with the y-axis, meaning that the y-component of $y^*$ represents the residue.
The matrix $OX$ will be an upper-triangular matrix now (a rather simple one).

Jameson · Sep 3, 2013

I think I'm starting to get this a little better. A major problem in my intuition stems from that in linear algebra I'm used to writing $Ax=b$, where $A$ is an $m \times n$ matrix, $b$ is a column vector of weights and $b$ is the result. In this problem and discussion we have the form $y=Xb$, where $X$ is a matrix and $b$ is a column vector.

From your diagram I see that multiplying by the orthogonal matrix $O$ will simply rotate everything, but not change any lengths. So we basically multiply everything by $O$ and then we have the equality: $||y-Xb||=||y^*-X^*b||$. Correct?

I like Serena · Sep 3, 2013

Yes. That is correct.
An important property of an orthogonal transformation in general, is that it keeps lengths of vectors unchanged.

Deveno · Sep 4, 2013

One definition of an orthogonal matrix $\displaystyle O$ (or even better, an orthogonal linear transformation) is that:

$\displaystyle OO^T = O^TO = I$.

It then follows that if we use the definition (relative to some (real) inner product):

$\displaystyle \|x\| = \sqrt{\langle x,x \rangle}$

that:

$\displaystyle \|Ox\| = \sqrt{\langle Ox,Ox \rangle}$

$\displaystyle = \sqrt{\langle x,O^T(Ox) \rangle}$

(using essentially the same kind of proof as in another one of your threads)

$\displaystyle = \sqrt{\langle x,x \rangle} = \|x\|$.

Geometrically, orthogonal transformations preserve lengths and volumes, but may reverse orientations (such as a reflection about a line). In fact, it is often convenient to pick a particular reflection, such as:

$\displaystyle R = \begin{bmatrix}1&0&\dots&0\\0&1&\dots&0\\ \vdots&\vdots&\ddots&\vdots\\0&0&\dots&-1 \end{bmatrix}$

and express an orthogonal matrix as either:

$\displaystyle S$ or $\displaystyle RS$, where $\displaystyle \det(S) = 1$.

Matrices of the form $\displaystyle S$ are called "proper rotations" and matrices of the form $\displaystyle RS$ are called "improper rotations" or (rotations followed by a (distinguished) reflection). It follows that 2 improper rotations composed yield a (proper) rotation.

(my apologies in advance for this small diversion).

Jameson · Sep 5, 2013

Thank you to all for help on this problem. I'm going to talk it over with my professor so for now I'll put this thread on hold in my mind and hopefully post back soon when I have and understand the full solution.

What is the proof for the residual sum of squares problem?

Attachments

FAQ: What is the proof for the residual sum of squares problem?

What is residual sum of squares?

What does a high RSS indicate?

How is RSS calculated?

What is the goal of minimizing RSS?

How is RSS used to evaluate the performance of a regression model?

Similar threads

Hot Threads

Recent Insights