Expected coefficient change from simple to multiple linear regression

In summary: Structural collinearity (a term I have not come across before) refers to a situation where the correlation among the explanatory variables arises from a fundamental relationship between the variables themselves, rather than from the variables being caused by some other variable. Structural collinearity can be a problem because it can lead to spurious correlations that are not actually due to the explanatory variables.Sure, sorry, I used "cause" inadvertently. But just the fact that ##X## and ##X^2## are deterministically dependent terms would make me think that structural collinearity would emerge from them...
  • #1
fog37
1,569
108
TL;DR Summary
understand the expected coefficient change (magnitude and sign) from simple to multiple linear regression
Hello forum,

I have created some linear regression models based on a simple dataset with 4 variables (columns). The first models simply involve one predictor variable: $$Y=\beta_1 X_1+\beta_0$$ and $$Y=\beta_2 X_2+ \beta_0$$
The 3rd model is multiple linear regression model involving the 3 predictors: $$Y= \beta_3 X_3 + \beta_2 X_2 + \beta_1 X_1 + \beta_0$$
I believe that the coefficient ##\beta_1## or ##\beta_2## for the predictors ##X_1## and ##X_2## change in magnitude when the two predictors are included together in the multivariate model (model 3), correct? What about the sign of those coefficients? Should the sign stay the same or can it possibly change?

I would think that the sign should remain the same to indicate that the variable ##Y## and ##X_1## (or ##X_2##) vary in the same direction in both the simple and multiple linear regression models...

Now, if multicollinearity is present, the coefficients for each predictor would certainly change in magnitude and sign from the coefficients in the simple linear regression model but not in the correct way due to the inter-variable correlation...

Thanks
 
Physics news on Phys.org
  • #2
I agree with you but with a couple of caveats:
  1. In real-world models, multicollinearity (correlation between the explanatory variables ##X_1,X_2,X_3##) is usually present, which undermines the expectation stated in your second last para.
  2. Even without genuine multicollinearity, random idiosyncratic variation in the sample can make an appearance of multicollinearity, in which case we can still get sign changes of coefficients. This will not usually happen, but it will sometimes. The larger the data set, the less often it will happen.
 
  • Like
Likes fog37
  • #3
andrewkirk said:
I agree with you but with a couple of caveats:
  1. In real-world models, multicollinearity (correlation between the explanatory variables ##X_1,X_2,X_3##) is usually present, which undermines the expectation stated in your second last para.
  2. Even without genuine multicollinearity, random idiosyncratic variation in the sample can make an appearance of multicollinearity, in which case we can still get sign changes of coefficients. This will not usually happen, but it will sometimes. The larger the data set, the less often it will happen.
Thanks for the quick and interesting reply. I am indeed surprise to learn that even, without any multicollinearity, a change in coefficient sign may be possible when the same variables of interest are present in both a simple and in a multiple regression model...

In regards to multicollinearity, my understanding is that it affects the coefficients' values in strange ways. I recently learned that, in the case of a model with a term ##X## and a quadratic term ##X^2##, like $$Y=\beta_1+\beta_2 X^2$$ it seems that multicollinearity would not be a problem if ##X## and ##X^2## are dependent (even if not linearly dependent). Isn't the fact that one variable changing causes an change in the other variable the prime definition of multicollinearity?
 
  • #4
fog37 said:
Isn't the fact that one variable changing causes an change in the other variable the prime definition of multicollinearity?
No, they just have to be correlated. Causation is not part of the definition (eg see here). A common situation is where the correlation arises from each of the explanatory variables being driven ("caused") by another variable that may not be part of the set of explanatory variables. eg in a regression that had population crime levels and sickness levels as explanatory variables, we would likely find that those two are correlated because driven by a third variable of average population wealth, which may not be in the explanatory variables.
 
  • Like
Likes fog37
  • #5
andrewkirk said:
No, they just have to be correlated. Causation is not part of the definition (eg see here). A common situation is where the correlation arises from each of the explanatory variables being driven ("caused") by another variable that may not be part of the set of explanatory variables. eg in a regression that had population crime levels and sickness levels as explanatory variables, we would likely find that those two are correlated because driven by a third variable of average population wealth, which may not be in the explanatory variables.
Sure, sorry, I used "cause" inadvertently. But just the fact that ##X## and ##X^2## are deterministically dependent terms would make me think that structural collinearity would emerge from them...
 

FAQ: Expected coefficient change from simple to multiple linear regression

1. What is the expected coefficient change when moving from simple to multiple linear regression?

The expected coefficient change when moving from simple to multiple linear regression can vary significantly depending on the relationships between the predictor variables and the response variable. In simple linear regression, the coefficient represents the effect of a single predictor on the response. In multiple linear regression, the coefficients are adjusted to account for the presence of other predictors, which can lead to increases or decreases in the coefficients compared to their simple regression counterparts.

2. Why do coefficients change when adding more predictors?

Coefficients change when adding more predictors because multiple linear regression estimates the unique contribution of each predictor while controlling for the influence of the others. This means that the coefficients reflect the conditional relationship between each predictor and the response variable, which can lead to changes in magnitude and direction compared to the simple regression case where only one predictor is considered.

3. Can the coefficient of a predictor become negative in multiple regression when it was positive in simple regression?

Yes, the coefficient of a predictor can become negative in multiple regression even if it was positive in simple regression. This can occur when other predictors included in the model have a strong correlation with the original predictor and the response variable. The multiple regression model adjusts for these correlations, which can result in a change in the sign of the coefficient.

4. How does multicollinearity affect coefficient estimates in multiple regression?

Multicollinearity occurs when two or more predictor variables are highly correlated with each other, which can lead to unstable coefficient estimates in multiple regression. In the presence of multicollinearity, the coefficients may have inflated standard errors, making them less reliable and more sensitive to changes in the model. This can result in large shifts in coefficient values and can complicate the interpretation of the model.

5. How can one assess the impact of adding predictors on coefficient estimates?

One can assess the impact of adding predictors on coefficient estimates by comparing the coefficients and their significance levels before and after adding the new predictors. This can be done using metrics such as the adjusted R-squared, which accounts for the number of predictors in the model, and by examining changes in p-values and confidence intervals for the coefficients. Additionally, statistical tests like the F-test can be used to determine if the addition of predictors significantly improves the model.

Back
Top