What is the proof for the residual sum of squares problem?

In summary: I'm sorry, I can't continue this without more linear algebra knowledge). The y-coordinates of the upper-triangular matrix will be the same as the y-coordinates of $O$, but the x-coordinates will be the ones that were originally in $X$.
  • #1
Jameson
Gold Member
MHB
4,541
13
Problem: Through transformation with orthogonal matrix $O$, the problem \(\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y-Xb||^2\) is equivalent to \(\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y^{*}-X^{*}b||^2\), where $y$ and $y^{*}$ are in $\mathbb{R}^m$, $X$ and $X^{*}$ are in $\mathbb{R}^{m \times n}$ ($m \ge n$) and $y^{*}=Oy$ and $X^{*}=OX$. Let $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$.

Prove that the residual sum of square \(\displaystyle ||y-X \hat{b}||^2=\sum_{i=n+1}^{m}||y_i^{*}||^2\).

Solution: I must admit I am a bit overwhelmed by this problem. I believe that $\underset{b}{\operatorname{arg min}}$ just means "for the lowest value of $b$", correct? I think I should start by reading up on the concepts which are needed to solve this. Can someone highlight the main ideas I'll need to know to attack this?
 
Physics news on Phys.org
  • #2
Jameson said:
Problem: Through transformation with orthogonal matrix $O$, the problem \(\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y-Xb||^2\) is equivalent to \(\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y^{*}-X^{*}b||^2\), where $y$ and $y^{*}$ are in $\mathbb{R}^m$, $X$ and $X^{*}$ are in $\mathbb{R}^{m \times n}$ ($m \ge n$) and $y^{*}=Oy$ and $X^{*}=OX$. Let $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$.

Prove that the residual sum of square \(\displaystyle ||y-X \hat{b}||^2=\sum_{i=n+1}^{m}||y_i^{*}||^2\).

Solution: I must admit I am a bit overwhelmed by this problem. I believe that $\underset{b}{\operatorname{arg min}}$ just means "for the lowest value of $b$", correct? I think I should start by reading up on the concepts which are needed to solve this. Can someone highlight the main ideas I'll need to know to attack this?

Hi Jameson!

First you want to proof part of the Spectral Theorem yourself, and now you want to proof part of the Generalized method of Least Squares yourself?
What kind of course are you following?

Anyway, $\underset{b}{\operatorname{arg min}}$ means the value of b for which the expression takes its lowest value.

Btw, there seems to be something missing from your problem statement.
For the statement to be true, the orthogonal matrix should transform in such a way that $(y^* - X^*\hat b)$ has $0$ for its first $n$ entries, or something like that.

Edit: Oh, and I hope you don't mind, but I have just moved this thread to Linear Algebra, which seems a better fit to me.
 
  • #3
The course is called "Computational methods" and it already seems to be way out of my league to be honest, but I'm hoping we'll switch over to coding more than theory. This problem and the other problem I posted are far from trivial and kind of scary for a first homework set.

Here is a slide from the lecture notes that I believe I can generalize to show the desired result, but to be frank I don't understand it at all. Looks like I have a lot of reading to do! :(

2h80lcw.png
 
  • #4
Well, as I see it, in any course I would suggest to first make sure you know what all the symbols and definitions mean. Then to try and apply the theorems that are given.
It sounds a bit as if for your purpose that is enough.
And if you want to dive in deep, the third iteration is to understand the proofs and try to think them up yourself.

The proofs that you ask for suggest that a bit of linear algebra is a prerequisite. It seems a bit out of scope for a course of computational methods to ask you to think up these proofs yourself from scratch.
What is probably more useful for you is to see how you can apply the formulas that are given.
Do you know for instance what they mean by finding a solution by back substitution?
 
  • #5
I agree with you completely about the levels and methods of understanding. I took linear algebra at this university in spring and we didn't cover a lot of this material, definitely not the Spectral Theorem so I feel professors aren't communicating very well at all.

The back substitution method is an iterative process for solving upper triangular matrices.

For the proof to make sense, I believe $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$ should be upper-triangular. If so that would be a good place to start but I'm not sure it must be. I need to first grasp the purpose of $y^{*}=Oy$ and $X^{*}=OX$.

EDIT: Hmm, $y^*$ is in $\mathbb{R}^m$, so it's a column vector. Don't think it can be upper-triangular then.
 
  • #6
True, $y^*$ can't be upper-triangular. It is $X^*=OX$ that will be upper-triangular.

Let me try to illustrate with a choice for the variables that is as simple as possible.
Suppose we pick:
$$n=1, \quad m=2, \quad y=\begin{pmatrix}1\\3\end{pmatrix}, \quad X = \begin{pmatrix} 2 \\1\end{pmatrix}$$
The residue is the squared length of the vector between $y$ and the line $Xb$.
The residue takes on its lowest value for $\hat b = (1)$. (Yes, b is 1-dimensional.)

We can draw them like this:

View attachment 1244

The orthogonal matrix $O$ would be a 2x2 matrix that rotates everything such that 1 of the coordinates becomes zero. At that time the residue is aligned with the y-axis, meaning that the y-component of $y^*$ represents the residue.
The matrix $OX$ will be an upper-triangular matrix now (a rather simple one).
 

Attachments

  • least_squares_residue.png
    least_squares_residue.png
    3.5 KB · Views: 85
  • #7
I think I'm starting to get this a little better. A major problem in my intuition stems from that in linear algebra I'm used to writing $Ax=b$, where $A$ is an $m \times n$ matrix, $b$ is a column vector of weights and $b$ is the result. In this problem and discussion we have the form $y=Xb$, where $X$ is a matrix and $b$ is a column vector.

From your diagram I see that multiplying by the orthogonal matrix $O$ will simply rotate everything, but not change any lengths. So we basically multiply everything by $O$ and then we have the equality: $||y-Xb||=||y^*-X^*b||$. Correct?
 
  • #8
Yes. That is correct.
An important property of an orthogonal transformation in general, is that it keeps lengths of vectors unchanged.
 
  • #9
One definition of an orthogonal matrix \(\displaystyle O\) (or even better, an orthogonal linear transformation) is that:

\(\displaystyle OO^T = O^TO = I\).

It then follows that if we use the definition (relative to some (real) inner product):

\(\displaystyle \|x\| = \sqrt{\langle x,x \rangle}\)

that:

\(\displaystyle \|Ox\| = \sqrt{\langle Ox,Ox \rangle}\)

\(\displaystyle = \sqrt{\langle x,O^T(Ox) \rangle}\)

(using essentially the same kind of proof as in another one of your threads)

\(\displaystyle = \sqrt{\langle x,x \rangle} = \|x\|\).

Geometrically, orthogonal transformations preserve lengths and volumes, but may reverse orientations (such as a reflection about a line). In fact, it is often convenient to pick a particular reflection, such as:

\(\displaystyle R = \begin{bmatrix}1&0&\dots&0\\0&1&\dots&0\\ \vdots&\vdots&\ddots&\vdots\\0&0&\dots&-1 \end{bmatrix}\)

and express an orthogonal matrix as either:

\(\displaystyle S\) or \(\displaystyle RS\), where \(\displaystyle \det(S) = 1\).

Matrices of the form \(\displaystyle S\) are called "proper rotations" and matrices of the form \(\displaystyle RS\) are called "improper rotations" or (rotations followed by a (distinguished) reflection). It follows that 2 improper rotations composed yield a (proper) rotation.

(my apologies in advance for this small diversion).
 
  • #10
Thank you to all for help on this problem. I'm going to talk it over with my professor so for now I'll put this thread on hold in my mind and hopefully post back soon when I have and understand the full solution.
 

FAQ: What is the proof for the residual sum of squares problem?

What is residual sum of squares?

The residual sum of squares (RSS) is a statistical measure used to evaluate the accuracy of a regression model. It measures the difference between the actual values and the predicted values of the dependent variable in a regression model.

What does a high RSS indicate?

A high RSS indicates that the regression model does not fit the data well. In other words, there is a large difference between the actual values and the predicted values, and the model is not able to explain the variability in the data.

How is RSS calculated?

RSS is calculated by summing the squared differences between the actual values and the predicted values of the dependent variable in a regression model. This is done for all data points in the dataset, and the resulting value is a measure of the overall error in the model.

What is the goal of minimizing RSS?

The goal of minimizing RSS is to find the regression model that best fits the data. By minimizing the RSS, we are finding a model that explains the variability in the data and has the smallest difference between the actual values and the predicted values.

How is RSS used to evaluate the performance of a regression model?

RSS is used as a measure of the goodness of fit for a regression model. A lower RSS value indicates that the model is better at predicting the dependent variable, while a higher RSS value indicates a poorer fit. It is often used in conjunction with other metrics, such as R-squared, to evaluate the performance of a regression model.

Similar threads

Replies
1
Views
1K
Replies
4
Views
1K
Replies
13
Views
1K
Replies
34
Views
2K
Replies
1
Views
888
Back
Top