Line of regression substitution

In summary, the conversation discusses the concept of linear regression and how it can be used to estimate values of x based on known values of y. However, it is not possible to substitute y to estimate x, as the product of the slopes of the regression lines is equal to the R-squared value and only equals 1 when the variables are perfectly correlated. In cases of non-linear relations, the relationship between the two variables may only be invertible locally.
  • #1
Einstein44
125
31
Homework Statement
This is relatively straight forward, but I somehow forgot why this is:
Why is it that you can not substitute y to find x? I remember that this was the case, but I can't seem to remember why this actually is.
Relevant Equations
$$y=ax+b$$
This is the equation you get for a line of regression of a data set using the GDC...
I am not exactly sure in what context this is, as I cannot remember much about this and I couldn't find anything on the internet that mentioned this. I just hope someone understands what I mean :)
.
 
Physics news on Phys.org
  • #2
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
 
  • #3
Office_Shredder said:
I don't understand your question. In general the point of a linear regression is so you can substitute in a value of x to find a good guess for y. You won't get exactly the right answer just because you usually assume there's some noise in your prediction, is that what you mean?
Nevermind, I believe I phrased this wrong. I meant why you cannot substitute y to estimate x. Because I remember the prof saying that you can substitute x to estimate y, but not the other way around. And I forgot the reason and didn't find anything on this on the internet, so I thought maybe someone knows what I mean.
 
  • #4
Oh yeah. I think the way to think about this is that you can consider two linear regressions (I'm going to assume the constant term comes out zero for both)

##y=\beta_x x##
##x= \beta_y y##.

It's tempting to think that ##\beta_x \beta_y =1##. But it's not, in fact in general the product of the betas is ##R^2## value of the linear regression, and only equals 1 when the two variables are perfectly correlated. As a simple example, suppose x and y are totally uncorrelated. Then ##\beta_x=\beta_y=0##. If they are only slightly correlated, you might get that ##\beta_x## and ##\beta_y## are both small and almost zero. Then trying to invert your linear regression is going to give you a very bad result for an estimate.
 
  • Like
Likes Einstein44
  • #5
The regression shown was calculated to minimize the sum-squared-errors of the y estimates versus the y sample values. Those errors are the distances parallel to the Y-axis. If you want to estimate x, you would want a regression line that minimizes the sum-squared-errors of the x estimates versus the x sample values. Those errors are the distances parallel to the X-axis. So the minimization would be different.
 
  • Like
Likes Einstein44
  • #6
As Schreder said, product of slopes is ## R^2##, where ##R## is the correlation coefficient.
Slopes are given as ##R \frac {s_{xx}}{s_{yy}}##, so that the products *

Barring cases where either denominator is ##0, R \frac {s_{yy}}{s_{xx}} * R \frac {s_{xx}}{s_{xx}} =R^2##,

Notice that for nonlinear relations, the relation between the two may be invertible only locally , e.g., for
Hooke's law ##y =kx^2 ##

* Barring cases when either is 0, which means data is constant.
 
  • Like
Likes Einstein44

FAQ: Line of regression substitution

What is "Line of Regression Substitution"?

"Line of Regression Substitution" is a statistical method used to estimate missing data points in a dataset by replacing them with values predicted by a linear regression model.

How is the "Line of Regression Substitution" calculated?

The "Line of Regression Substitution" is calculated by fitting a linear regression model to the available data points and then using the resulting equation to predict the missing values.

What are the assumptions of using "Line of Regression Substitution"?

The main assumptions of using "Line of Regression Substitution" are that the data follows a linear trend and that the missing values are randomly distributed and not related to the other variables in the dataset.

What are the advantages of using "Line of Regression Substitution"?

One advantage of using "Line of Regression Substitution" is that it allows for the estimation of missing values without altering the overall distribution of the data. It also takes into account the relationship between the variables in the dataset.

What are the limitations of using "Line of Regression Substitution"?

One limitation of using "Line of Regression Substitution" is that it assumes a linear relationship between the variables, which may not always be the case. It also does not work well with datasets that have a small number of data points or a large number of missing values.

Back
Top