Using correlation coefficients as x in a regression?

In summary: Example: suppose you are studying the rate of suicides in a county. You might find that the suicide rate increases when the amount of unemployment in the county increases. The rolling correlation, X, between unemployment and suicide rates indicates how well they are in balance. So a natural model would be a linear regression between X and Y.What do I think is asserted in this situation?In summary, using correlation coefficients as x in a regression might be "wrong" but it's not an assumption that is violated.
  • #1
╔(σ_σ)╝
839
2
Using correlation coefficients as x in a regression??

I was reading an article in the Wall street journal and the author was using a rolling correlation coefficient, on a set of variables, as his predictor variable in a linear regression.

Basically it was a uni-variate linear regression , y= mx+b and x was the Pearson correlation coefficient calculated using a 30 day window on two random variables.

This seems "wrong" but I am not sure that it is. I don't know of any regression assumption this violates but at the same time it just doesn't seem like you can do this sort of thing.

What do you think ?
 
Physics news on Phys.org
  • #2
I think this isn't a mathematical question until we know what is assumed and what is asserted. For example, what are the assumptions of "regression"? What consequences follow from those assumptions? There are several types of regression.
 
  • #3
There is nothing mathematically wrong with this model, per say. It's easy to define an example . Suppose we have two random variables, C and D and X is the rolling correlation between C and D. We can define a random variable Y as Y = ax+b for real constants a,b. Then the linear regression would be a perfect model for Y.

I thought of a type of situation that would naturally lead to a model like you have. When the subject of interest, Y, is related to how well two other variables, C and D, are in or out of balance, this type of model might easily occur. Example: Suppose you are studying predator/pray (C and D) and trying to predict the rate, Y, that predators are killing that prey. When they are out of balance, too many predators or not enough predators, the rate of kills, Y, will be smaller than when they are in balance. The rolling correlation, X, between predator and prey numbers indicates how well they stay in balance. So a natural model would be a linear regression between X and Y.

Other examples are where the rate of something, Y, increases when two other things, C and D, get out of balance.
 
Last edited:

FAQ: Using correlation coefficients as x in a regression?

What is a correlation coefficient?

A correlation coefficient is a statistical measure that quantifies the strength and direction of the relationship between two variables. It ranges from -1 to +1, with a value of -1 indicating a perfect negative correlation, a value of +1 indicating a perfect positive correlation, and a value of 0 indicating no correlation.

How is a correlation coefficient used in regression analysis?

In regression analysis, the correlation coefficient is used as the independent variable, or x-value, to predict the values of the dependent variable, or y-value. It helps to determine the strength and direction of the relationship between the two variables and can also be used to identify any outliers or influential data points.

What is the difference between a correlation coefficient and a regression coefficient?

A correlation coefficient measures the strength and direction of the relationship between two variables, while a regression coefficient represents the change in the dependent variable for every one-unit change in the independent variable. In other words, the regression coefficient shows the impact of the independent variable on the dependent variable, while the correlation coefficient shows the overall relationship between the two variables.

Can a correlation coefficient be used to determine causation?

No, a correlation coefficient only measures the strength and direction of the relationship between two variables. It does not imply causation, as there may be other factors at play that are influencing the relationship between the two variables. To determine causation, further research and experimentation are needed.

How do outliers affect the correlation coefficient in regression analysis?

Outliers can have a significant impact on the correlation coefficient in regression analysis. If there are extreme values that do not follow the general trend of the data, the correlation coefficient may be skewed and not accurately represent the relationship between the two variables. It is important to identify and address outliers in order to obtain a reliable correlation coefficient.

Back
Top