How to prove zero correlation between residuals and predictors?

In summary: So the sum of squares of the errors in the predicted values would be the same as the sum of squares of the errors in the original dataset, and the model would not be able to minimize it.In summary, the covariance of the residuals in a least squares linear regression model is always zero.
  • #1
NotEuler
58
2
Hi,
I'm trying to figure out something I'm pretty sure is true, but don't know how to prove it. I couldn't find the answer with a google search, but hopefully someone here knows the answer!

So I have a linear least squares multiple regression model:
Y=a+bX1+cX2+e

where a is the intercept, X1 and X2 predictor/independent variables, and e denotes the residuals.
The model (i.e. the values of a, b and c) is fitted so that Ʃe^2 is minimized.

How do I prove that cov(e,X1)=cov(e,X2=0?

Thanks!
NotEuler
 
Physics news on Phys.org
  • #2
Maybe I should clarify my question...

1) Assume I have a dataset of dependent variables Yi, and independent variables X1i and X2i.

2) I fit a linear regression model to that dataset: Y=a+bX1+cX2+e.

3) The model is fitted, i.e. the parameters a, b and c are determined, so that the sum of square of the errors Ʃei^2 = Ʃ(Yi-a-bX1i-cX2i)^2 is minimized.

4) I then calculate the covariance of the e:s from that same fitted model, and either set of independent variables (X1:s or X2:s) from the original dataset.

5) I think both cov(e,X1) and cov(e,X2) will always equal zero, regardless of what the original dataset was, and regardless of whether the real dependences are linear or something else.
I also think this should hold for any number of independent variables.

6) I think that to prove this, I need to write the covariance as cov(e,X1) = cov(Y-a-bX1-cX2, X1) = cov(Y,X1)-cov(a,X1)-cov(bX1,X1)-cov(cX2,X1).
And then somehow use the consequences of step 3 to show that if the square of errors is minimized, then this covariance is always zero.Does this make any sense? I'm no expert on regressions or covariances, so this might be hard to follow. It's also possible I'm wrong, and cov(e,X1) is not always zero.
Either way, any hints on how to proceed would be much appreciated!

Cheers,
NotEuler
 
  • #3
NotEuler said:
3) The model is fitted, i.e. the parameters a, b and c are determined, so that the sum of square of the errors Ʃei^2 = Ʃ(Yi-a-bX1i-cX2i)^2 is minimized.

And then somehow use the consequences of step 3 to show that if the square of errors is minimized, then this covariance is always zero.

The partial derivatives of the function in step 3 with respect to a,b,c would be zero at an extrema. Perhaps that helps.
 
  • Like
Likes 1 person
  • #4
Yes, that helps a lot! Here's a sketch of the proof, happy to hear if you see any mistakes. I've changed the notation slightly to show that it applies to a regression model with any number of predictors. I will denote means with ~ (i.e. the mean of the ei:s =E(ei)=e~.

1) Assume I have a dataset of dependent variables Yi, and independent variables X1i, X2i, X3i,... Xki.

2) I fit a linear regression model to that dataset: Y=a + bX1 + Z + e, where Z is a linear combination of all the independent variables from X2 onwards: Z=cX2+dX3+...
Z is therefore independent of a and b.

3) The model is fitted, i.e. the parameters a, b, c, d... are determined, so that the sum of square of the errors s(a,b,c,d...) = Ʃei^2 = Ʃ(Yi-a-bX1i-Zi)^2 is minimized.

4) To do this, I calculate the partial derivatives of s for a,b,c,d... and set them to equal 0.
I find that
∂s/∂a = -2 Ʃ(Yi-a-bX1i-Zi). Therefore Ʃ(Yi-a-bX1i-Zi) = Ʃei = 0, and E[e]=e~= 0
∂s/∂b = -2 Ʃ X1i (Yi-a-bX1i-Zi). Therefore Ʃ X1i (Yi-a-bX1i-Zi) = Ʃ X1i ei= 0

5) Ʃ (ei-e~)(X1i-X1~) = Ʃ (eiX1i - eiX1~ - e~X1i + e~X1~)
= ƩeiX1i - ƩeiX1~ - Ʃe~X1i + Ʃe~X1~ = 0 - X1~Ʃei -Ʃ0 + Ʃ0 = -X1~0 = 0

Therefore Cov(e,X1) = 0, which is what I wanted to prove.

Now I could replace X1 with any of the other X:s that are all combined in Z, and repeat the above analysis. Because the regression function is symmetric for all the predictor variables, I would then find that cov(e,Xk)=0 for any k.

Therefore the residuals are always uncorrelated with the predictors in a least squares linear regression model.
 
  • #5
Now that I think about it, this result immediately implies that the residuals are also uncorrelated with the values predicted by the model (i.e. not the original dataset Yi, but the y=a+bX1+Z predicted.

This is because (following the notation above) cov(e,y)=cov(e, a+bX1+Z)=cov(e,a)+cov(e,bX1)+cov(e,Z)=0+0+0.
 

FAQ: How to prove zero correlation between residuals and predictors?

How do you define correlation between residuals and predictors?

The correlation between residuals and predictors refers to the degree to which the residuals (the differences between the actual and predicted values) and the predictors (the independent variables used to make the predictions) are related. A zero correlation would mean that there is no relationship between the residuals and predictors.

Why is it important to prove zero correlation between residuals and predictors?

Proving zero correlation between residuals and predictors is important because it ensures that the model used to make predictions is accurate and reliable. If there is a correlation between the two, it could indicate that the model is not capturing all the relevant information and may need to be revised or improved.

What statistical tests can be used to prove zero correlation between residuals and predictors?

There are several statistical tests that can be used to prove zero correlation between residuals and predictors, including Pearson's correlation coefficient, Spearman's rank correlation coefficient, and the Kruskal-Wallis test. These tests measure the strength and direction of the relationship between two variables, and a result of zero would indicate no correlation.

Can visualizations be used to prove zero correlation between residuals and predictors?

Yes, visualizations such as scatter plots and residual plots can be used to visually assess the correlation between residuals and predictors. If the points on the plot are randomly distributed, it is an indication of zero correlation between the two variables.

What are some potential causes of non-zero correlation between residuals and predictors?

There are several potential causes of non-zero correlation between residuals and predictors, including omitted variables, incorrect model specification, and outliers. It is important to thoroughly evaluate the model and data to identify and address any potential causes of non-zero correlation.

Back
Top