Bivariate correlation does not always catch multicollinearity

  • I
  • Thread starter fog37
  • Start date
  • #1
fog37
1,568
108
TL;DR Summary
Bivariate correlation does not always catch multicollinearity
Hello,

While studying multicollinearity, I learned that if there are more than 2 predictors ##X##, for examples 3 predictors ##X_1, X_2, X_3##, it may be possible for all the possible pairwise correlations to be low in value but multicollinearity to still be an issue...That would mean that the "triple" correlation, i.e. the average of the products ##(X_1 X_2 X_3)##, would have a high value (higher than 0.7)...Is that correct?

Would you a have a simple example of how three variables may be correlated collectively even if their pairwise correlation is low?

Thank you!
 
Physics news on Phys.org
  • #2
In a visual sense, using Venn diagrams, how can the predictors be correlated all together if they are not pairwise correlated at all? The figures below show moderate multicollinearity and strong multicollinearity. I don't see how the ##X## circles cannot overlap and still cause multicollinearity...

1704308542699.png
 
  • #3
It may depend on how low you demand the individual pairwise correlations to be. Suppose that ##X_1## and ##X_2## are independent, identically distributed random variables and that ##Y = X_1+X_2##. Then I think it is clear that the correlation of ##Y## with any one ##X_i## may be smaller than the threshold even though ##Y## is a deterministic function of ##X_1, X_2##.
In fact, it gets easier when ##Y## is a function of more independent ##X_i## variables. Any one ##X_i## might have a low correlation with ##Y## but the combination of all the ##X_i##s might completely determine ##Y##. Suppose ##Y = X_1+X_2+...+X_{100}##, where the ##X_i##s are pairwise independent.
 
  • Like
Likes Office_Shredder and fog37
  • #4
FactChecker said:
It may depend on how low you demand the individual pairwise correlations to be. Suppose that ##X_1## and ##X_2## are independent, identically distributed random variables and that ##Y = X_1+X_2##. Then I think it is clear that the correlation of ##Y## with any one ##X_i## may be smaller than the threshold even though ##Y## is a deterministic function of ##X_1, X_2##.
In fact, it gets easier when ##Y## is a function of more independent ##X_i## variables. Any one ##X_i## might have a low correlation with ##Y## but the combination of all the ##X_i## s might completely determine ##Y##. Suppose ##Y = X_1+X_2+...+X_{100}##, where the ##X_i## are pairwise independent.
Processing...multicollinearity is when the predictors are correlated in such a way that the estimated coefficient for a predictor, which would indicate the change in ##Y## per unit change in ##X##, is not what it is really is because ##X_1## and ##X_2## are correlated so when ##X_1## changes by one unit we cannot hold ##X_2## fixed and it changes too...

Let's say ##Y=b_1 X_1 + b_2 X_2 + b_3 X_3##...and the predictors ##Xs## are pairwise linearly independent with the correlation coefficients being low: ##r_{12} = r_{13} = r_{23} \approx 0.2##. That is not an automatic proof of lack of multicollinearity...

It could be ##r_{123} \approx 0.8##... But could that be? How can they collectively be more correlated than pairwise? I am struggling to see that, especially visually using the Venn diagram where each smaller circle represents the variance of ##X## and the larger circle is the variance of ##Y##...
 
  • #5
Oh, maybe I get it now...It could be that ##Y=\beta_1 X_1 +\beta_2 X_2 + \beta_3 X_3## and the three regressors are pairwise uncorrelated to each other BUT the correlations between ##X_1## and, for example, the variable given by the sum ##X_2+X_3## to be nonzero and high in value. Same goes for the correlation btw ##X_2## and ##X_1+X_3##, etc.

I think that is what the variance inflation factor (VIF) does in checking these correlation combinations, which cannot be visualized with the Venn diagrams of the individual predictors and response variables, instead of focusing on the pairwise correlations...
 
  • #6
fog37 said:
Oh, maybe I get it now...It could be that ##Y=\beta_1 X_1 +\beta_2 X_2 + \beta_3 X_3## and the three regressors are pairwise uncorrelated to each other BUT the correlations between ##X_1## and, for example, the variable given by the sum ##X_2+X_3## to be nonzero and high in value.
Not if the ##X_i##s are independent. Then ##X_1## would be uncorrelated to ##X_2+X_3##.

I probably should leave this for others since I am not an expert. But if ##Y = X_1+X_2##, where the ##X##s are independent, then ##Y, X_1, X_2## are all estimators of ##Y## to varying extents. ##X_1## and ##X_2## are independent. ##Y## is somewhat correlated to an individual ##X_i##, but completely determined by the pair. The more ##X_i##s there are in the sum, the weaker would be the correlation between ##Y## and the individual ##X_i##s.
 

Related to Bivariate correlation does not always catch multicollinearity

What is bivariate correlation?

Bivariate correlation is a statistical method used to measure the strength and direction of the relationship between two variables. It quantifies how changes in one variable are associated with changes in another. The most common measure of bivariate correlation is the Pearson correlation coefficient, which ranges from -1 to 1, where values close to 1 or -1 indicate a strong relationship, and values near 0 indicate a weak relationship.

What is multicollinearity?

Multicollinearity occurs when two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a substantial degree of accuracy. This correlation among independent variables can lead to problems in estimating the coefficients of the regression model accurately, as it becomes difficult to discern the individual effects of each predictor on the dependent variable.

Why doesn't bivariate correlation always catch multicollinearity?

Bivariate correlation measures the relationship between two variables at a time, ignoring the influence of other variables. Multicollinearity, however, involves more than two variables where multiple variables may be inter-related in complex ways that are not captured by pairwise correlation coefficients. As a result, even if pairwise correlations are low, multicollinearity can still be present if there is a linear combination of three or more variables that are highly correlated.

How can you detect multicollinearity if bivariate correlations do not indicate it?

To detect multicollinearity beyond bivariate correlations, statisticians use techniques such as calculating the Variance Inflation Factor (VIF) for each predictor in a regression model. A high VIF indicates that the predictor has a lot of redundancy with other predictors, suggesting multicollinearity. Additionally, examining the condition index or eigenvalues from a principal component analysis can also help identify multicollinearity in the data.

What are the consequences of multicollinearity in regression analysis?

Multicollinearity can lead to several issues in regression analysis, including inflated standard errors for the coefficients, which results in wider confidence intervals and less reliable statistical tests. This can make it difficult to determine which variables are statistically significant predictors of the dependent variable. Additionally, multicollinearity can cause the coefficients' estimates to become unstable, where small changes in the data can lead to large changes in the estimates, reducing the model's reliability and interpretability.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
Replies
3
Views
761
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
3K
  • Atomic and Condensed Matter
Replies
3
Views
2K
Back
Top