- #1
fog37
- 1,569
- 108
- TL;DR Summary
- Understand how coefficient change sign in multiple Linear Regression with correlated predictors are included
Hello Forum,
I have read about an interesting example of multiple linear regression (https://online.stat.psu.edu/stat501/lesson/12/12.3). There are two highly correlated predictors, ##X_1## as territory population and ##X_2## as per capita income with Sales as the ##Y## variable. My understanding is that if a model includes correlated predictors, the regression coefficient for one of the predictor can flip in sign compared to when the model only has that one predictor. Further, the lecture notes state that "...even predictors that are not included in the model, but are highly correlated with the predictors in our model, can have an impact!..."
One would expect that as the territory population ##X_1## increases, so would the territory sales ##Y## (positive regression coefficient). However, the regression analysis provides a negative estimated coefficient for territory population: the population of the territory increases and the territory sales decrease (because the larger the territory, the larger to the competitor's market penetration keeping sales down even if the model does have data for the market competition). How does that happen? How can the regression results be affected by a variable (competitor market penetration) that is not included into the model?
How can including a certain predictor in the linear regression model cause the regression coefficient of another predictor to flip in sign and change in magnitude when the predictors are correlated (not possible when the predictors are perfectly uncorrelated)?
I am confused on how correlated predictors affect each other's regression coefficient making the individual effect of each predictor ambiguous...
Thank you!
I have read about an interesting example of multiple linear regression (https://online.stat.psu.edu/stat501/lesson/12/12.3). There are two highly correlated predictors, ##X_1## as territory population and ##X_2## as per capita income with Sales as the ##Y## variable. My understanding is that if a model includes correlated predictors, the regression coefficient for one of the predictor can flip in sign compared to when the model only has that one predictor. Further, the lecture notes state that "...even predictors that are not included in the model, but are highly correlated with the predictors in our model, can have an impact!..."
One would expect that as the territory population ##X_1## increases, so would the territory sales ##Y## (positive regression coefficient). However, the regression analysis provides a negative estimated coefficient for territory population: the population of the territory increases and the territory sales decrease (because the larger the territory, the larger to the competitor's market penetration keeping sales down even if the model does have data for the market competition). How does that happen? How can the regression results be affected by a variable (competitor market penetration) that is not included into the model?
How can including a certain predictor in the linear regression model cause the regression coefficient of another predictor to flip in sign and change in magnitude when the predictors are correlated (not possible when the predictors are perfectly uncorrelated)?
I am confused on how correlated predictors affect each other's regression coefficient making the individual effect of each predictor ambiguous...
Thank you!