Collinearity between predictors: what happens under the hood

  • I
  • Thread starter fog37
  • Start date
In summary: Interaction effects are not accounted for in a linear regression model by using a single beta. Instead, a multiple beta model is used to account for the interaction between the variables.
  • #1
fog37
1,568
108
TL;DR Summary
Understanding the idea of "keeping the other predictors fixed" to interpret partial coefficients
Hello,

In the presence of NO multicollinearity, with a linear regression model like ##Y=3 X_1+2 X_2##, the predictors ##X_1, X_2## are not pairwise correlated.
  • When ##X_1## changes by 1 unit, the dependent variable ##Y## change by a factor of ##3##, i.e. ##\Delta Y =3##, while the other variables are kept fixed/constant, i.e. they are not simultaneously changing with ##X_1## and participating in the ##\Delta Y## being equal to 3. By analogy, it is like the predictors are working "decoupled" gears.
  • However, when multicollinearity is present (##X_1## and ##X_2## are correlated), it is not true that as ##X_1## changes by 1 unit, the change ##\Delta Y=3## is not solely due to that unit change in ##X_1## alone while the other variables are fixed/constant. The number 3 is due to the explicit change of ##X_1## but also to the implicit change of ##X_2## (also by one unit?) caused by ##X_1##: changing the variable ##X_1## also changes automatically the variable ##X_2## which is not kept constant while ##X_1## changes.
I think my understanding is correct but I don't fully understand how all this happens mechanically within the data. Does the idea of "while keeping the other variables fixed" really mean that the calculation of the coefficients ##\beta## involves the pairwise correlation between ##r_{12}## compromising the purity of the coefficient? I just don't see how, operationally, changing ##X_1## by one unit (i.e. setting ##X_1=1##) automatically, under the hood, activates a change of ##X_2## in the equation which silently contributes to a partial change of ##\Delta Y##.

It is like ##\Delta Y## = (##\Delta Y## due to ##X_1##) + (##\Delta Y## due to ##X_2##)

Thank you for any clarification.
 
Physics news on Phys.org
  • #2
In the case of correlated independent variables, ##X_1## and ##X_2##, the coefficients of the linear regression are not necessarily unique. As an extreme example, consider the case where ##X_1= X_2##. The use of a second variable is completely redundant and linear regressions with both variables are possible with a whole set of coefficient combinations.

A step-by-step process alleviates the problem and gives statistical meaning to the coefficients. Suppose ##X_1= X_2## and the linear regression model of ##Y = a_1 X_1 + \epsilon## gives the minimal sum-squared-errors. Then there will be no correlation between the sample ##x_{2,i}## and the sample errors, ##\epsilon_i## because the factor ##a_1 X_1## has taken care of the entire correlation that could be obtained by adding ##X_2## to the linear model.

In a less extreme example, where ##X_1## and ##X_2## are correlated but not equal, there might be some residual error from the ##Y = a_1 X_1 + \epsilon## that can be reduced by adding an ##X_2## term to the linear regression. If the reduction is statistically significant, ##X_2## can be added. Then the term ##a_2 X_2## can be thought of as accounting for/explaining/predicting the residual errors left over by the ##Y = a_1 X_1+ \epsilon## model.

This process is automated in the stepwise linear regression algorithm. The results should be examined for validity and not just applied blindly. The bidirectional elimination algorithm is the most sophisticated. Suppose that variable ##X_1## gives the best single-variable model but ##X_2## and ##X_3## are added in later steps because their reduction of the residual errors were statistically significant. It can happen that the model with only ##X_2## and ##X_3## explains so much of the ##Y## values that the ##X_1## term is no longer statistically significant. The bidirectional elimination algorithm would go back and remove ##X_1## from the final regression result..
 
Last edited:
  • Like
Likes fog37
  • #3
Measuring collinearity is the same thing as asking what the beta is using ##X_1## to predict ##X_2## and getting a non zero answer. In your example suppose ##X_2=0.5X_1+\text{independent noise}##. Then how would you expect ##Y## to change if ##X_1## changes by 1 unit?

This is basically the same thing as the chain rule with multiple inputs, if that's something you're familiar with.
 
  • Like
Likes fog37
  • #4
Office_Shredder said:
Measuring collinearity is the same thing as asking what the beta is using ##X_1## to predict ##X_2## and getting a non zero answer. In your example suppose ##X_2=0.5X_1+\text{independent noise}##. Then how would you expect ##Y## to change if ##X_1## changes by 1 unit?

This is basically the same thing as the chain rule with multiple inputs, if that's something you're familiar with.
The chain rule example clears things well. For example, in the case of perfect collinearity, if ##X_2=-2X_1##

$$Y=3X_1 + 2X_2=3X_1-4 X_1$$

and ##\frac {\Delta Y}{{\Delta_X}_1} = -1## instead of ##\frac {\Delta Y}{{\Delta_X}_1} = 3##
 
  • #5
Yep. There is one very important difference with the chain rule. When taking derivatives, things invert in the natural way. If ##\partial X_2 /\partial X_1=3##, then ##\partial X_1 /\partial X_2=1/3##. Betas don't work that way. If ##X_2=0.5 X_1+noise##, then the only thing you can say is ##X_1=\beta X_2+noise## where ##|\beta|\leq 2## (equality only if they are perfectly correlated)
 
  • Like
Likes fog37
  • #6
Don't you consider interaction effects in your model, as in ##\beta_1X_1 + \beta_2 X_2+ \beta_3 X_1*X_2##?

Which you ultimately test for, to test for the assumption ##\beta_3=0##?
 
  • Like
Likes fog37
  • #7
WWGD said:
Don't you consider interaction effects in your model, as in ##\beta_1X_1 + \beta_2 X_2+ \beta_3 X_1*X_2##?

Which you ultimately test for, to test for the assumption ##\beta_3=0##?
You certainly can, but that can be done with the variable ##X_3 = X_1*X_2## in the usual way.
 
  • #8
Just a related question about fitting a multiple linear regression model to our multivariate data: what steps can we take to figure out if a multiple linear regression is the adequate model at all for our data ?
For simple linear regression, we can easily inspect the scatterplot between ##Y## and the single predictor ##X## to see if the cloud of data follows a linear trend...But in the case of multiple regressors ##X_1, X_2, X_3, X_4##, would we first plot individual scatterplots between ##Y## and ##X_1##, ##Y## and ##X_2##, ##Y## and ##X_3##, ##Y## and ##X_4## ?
And if the scatterplots are all showing a linear trend, then we try to fit the model with a multiple linear regression equation (aka a plane)? What if the data is linear for some scatterplots and not for others scatterplots?

Thank you!
 
  • #9
The best thing is if you have knowledge of the subject matter and are comfortable with the form of your model. The regression algorithm in any good statistical package will indicate the statistical significance of each term in the model. You should not include terms in the model that do not both make sense in the subject matter and pass the test of statistical significance.
 
  • Like
Likes fog37
  • #10
When you evaluate a regression model, you should keep one thing in mind. Suppose that two independent variables, ##X_1## and ##X_2## have positive correlations with ##Y##. It can easily happen that the best linear regression model ##Y = a_1 X_1 +a_2 X_2 +\epsilon## has ##a_1## a little high and that is corrected with a negative ##a_2##. That may be correct even though the sign of ##a_2## appears wrong. A close examination of the regression process will allow you to determine what happened.
 
  • #11
You also use the distribution of the coefficients to test the null hypothesis ##H_0: \beta_i=0, H_A: \beta_i \neq 0 ## And test the adjusted ##r^2##, see whether it inreases or decreases as you add variables. There are also methods like forward, stepwise regression: forward selection, backwards elimination.
https://en.wikipedia.org/wiki/Stepwise_regression
 
  • Like
Likes FactChecker and fog37
  • #12
FactChecker said:
In the case of correlated independent variables, ##X_1## and ##X_2##, the coefficients of the linear regression are not necessarily unique. As an extreme example, consider the case where ##X_1= X_2##. The use of a second variable is completely redundant and linear regressions with both variables are possible with a whole set of coefficient combinations.

A step-by-step process alleviates the problem and gives statistical meaning to the coefficients. Suppose ##X_1= X_2## and the linear regression model of ##Y = a_1 X_1 + \epsilon## gives the minimal sum-squared-errors. Then there will be no correlation between the sample ##x_{2,i}## and the sample errors, ##\epsilon_i## because the factor ##a_1 X_1## has taken care of the entire correlation that could be obtained by adding ##X_2## to the linear model.

In a less extreme example, where ##X_1## and ##X_2## are correlated but not equal, there might be some residual error from the ##Y = a_1 X_1 + \epsilon## that can be reduced by adding an ##X_2## term to the linear regression. If the reduction is statistically significant, ##X_2## can be added. Then the term ##a_2 X_2## can be thought of as accounting for/explaining/predicting the residual errors left over by the ##Y = a_1 X_1+ \epsilon## model.

This process is automated in the stepwise linear regression algorithm. The results should be examined for validity and not just applied blindly. The bidirectional elimination algorithm is the most sophisticated. Suppose that variable ##X_1## gives the best single-variable model but ##X_2## and ##X_3## are added in later steps because their reduction of the residual errors were statistically significant. It can happen that the model with only ##X_2## and ##X_3## explains so much of the ##Y## values that the ##X_1## term is no longer statistically significant. The bidirectional elimination algorithm would go back and remove ##X_1## from the final regression result..
It's important to note that stepwise regression methods are, in general, not good choices and even though I teach courses that discuss them I strongly urge students not to use them in practice. A few reasons:
1. The R^2 for models that come form them tend to be higher than they should be
2. The F statistics often reported don't really have F distributions
3. The standard errors of the parameter estimates are too small and so the confidence intervals around the parameter estimates are not accurate
5. Because of the multiple tests in the process the p-values are often too low are difficult to correct
6. The slope estimates are biased (this is probably not the strongest argument against them, since the notion of a slope estimate being unbiased simply means they are unbiased for the model you specify, and you have no idea whether it is the correct model)
7. These methods increase problems caused when there is collinearity in the predictors
For a good discussion of these issues see Frank Harrell's Regression Modeling Strategies (2001).
 
  • Skeptical
  • Like
Likes fog37 and FactChecker
  • #13
statdad said:
It's important to note that stepwise regression methods are, in general, not good choices and even though I teach courses that discuss them I strongly urge students not to use them in practice. A few reasons:
1. The R^2 for models that come form them tend to be higher than they should be
2. The F statistics often reported don't really have F distributions
3. The standard errors of the parameter estimates are too small and so the confidence intervals around the parameter estimates are not accurate
5. Because of the multiple tests in the process the p-values are often too low are difficult to correct
6. The slope estimates are biased (this is probably not the strongest argument against them, since the notion of a slope estimate being unbiased simply means they are unbiased for the model you specify, and you have no idea whether it is the correct model)
7. These methods increase problems caused when there is collinearity in the predictors
For a good discussion of these issues see Frank Harrell's Regression Modeling Strategies (2001).
IMO, if the assumptions are met, the mathematics is correct and well established.
 
  • Like
Likes fog37
  • #14
FactChecker said:
IMO, if the assumptions are met, the mathematics is correct and well established.
I'm not sure what you mean here: the points I made (again, look at Harrell for deeper discussion) are also mathematical points: they apply even if the assumptions are met.
Stepwise methods, by their nature, negate the benefits of the usual assumptions about LS regression.
 
  • Like
Likes fog37

FAQ: Collinearity between predictors: what happens under the hood

What is collinearity between predictors?

Collinearity between predictors, also known as multicollinearity, occurs when two or more predictor variables in a multiple regression model are highly correlated, meaning they contain similar information about the variance in the dependent variable. This can make it difficult to determine the individual effect of each predictor on the dependent variable.

Why is collinearity problematic in regression analysis?

Collinearity can be problematic because it makes the estimates of the regression coefficients unstable and their standard errors inflated. This means that the coefficients may not be statistically significant, even if they are theoretically important, and it can lead to incorrect conclusions about the relationships between variables.

How can you detect collinearity between predictors?

Collinearity can be detected using several methods, including examining correlation matrices, Variance Inflation Factor (VIF), and tolerance values. A high correlation coefficient (e.g., above 0.8 or 0.9) between two predictors indicates collinearity. A VIF value greater than 10 or a tolerance value less than 0.1 also suggests significant collinearity.

What strategies can be used to address collinearity?

Several strategies can be employed to address collinearity, including removing one of the highly correlated predictors, combining the correlated predictors into a single predictor, using principal component analysis (PCA) to reduce dimensionality, or applying regularization techniques such as ridge regression or LASSO that can handle collinearity by adding a penalty to the regression coefficients.

What happens to the regression coefficients when collinearity is present?

When collinearity is present, the regression coefficients can become highly sensitive to changes in the model. Small changes in the data can lead to large variations in the coefficients, making them unreliable. Additionally, the coefficients may have large standard errors, leading to wide confidence intervals and making it difficult to determine the true effect of each predictor.

Similar threads

Replies
5
Views
1K
Replies
125
Views
17K
Replies
1
Views
860
Replies
20
Views
5K
2
Replies
61
Views
8K
Back
Top