Regression line with zero slope and average as best prediction

  • #1
fog37
1,569
108
TL;DR Summary
Regression line with zero slope and average as best prediction
Hello,

I was considering some made up data ##(X,Y)## and a its best fit regression line. The outcome variable ##Y## is the number of likes and ##X## is the number of comments on a website.

We have 100 data points which spread in such a way that the best fit line has zero slope. This implies that there is no linear relationship between the variables ##X## and ##Y##. This also means that the average of ##Y## would be the best prediction for ##Y## regardless of the value of ##X##. It does not matter what the value of ##X## is, the best prediction for ##Y## would be equal to the average and have a constant value....

My question: here we are talking about taking the arithmetic average of ALL the ##Y## values from all different ##X## values, correct?
What about the average of the ##Y## values for the same ##X## value (assuming there is more than just one ##Y## value for each ##X## value)? These two averages should always be numerically close, correct?

Thank you!
 
Physics news on Phys.org
  • #2
Y does not depend on X value, so I could not infer anything about a particular Y. Unless there is a nonlinear relationship.
 
  • Like
Likes fog37
  • #3
fog37 said:
TL;DR Summary: Regression line with zero slope and average as best prediction

Hello,

I was considering some made up data ##(X,Y)## and a its best fit regression line.
Be careful with "best" here. It is the best that can be done with solid statistical significance. If you "throw everything at the wall to see what sticks" then you can often get very good fits to the data that has no statistical significance at all. You want to be able to convince people, even very skeptical ones, that every term in your model probably belongs there. A good linear regression application should only include terms that show a statistically significant reason to be included
fog37 said:
The outcome variable ##Y## is the number of likes and ##X## is the number of comments on a website.

We have 100 data points which spread in such a way that the best fit line has zero slope. This implies that there is no linear relationship between the variables ##X## and ##Y##. This also means that the average of ##Y## would be the best prediction for ##Y## regardless of the value of ##X##. It does not matter what the value of ##X## is, the best prediction for ##Y## would be equal to the average and have a constant value....

My question: here we are talking about taking the arithmetic average of ALL the ##Y## values from all different ##X## values, correct?
Yes. They are all involved in the linear regression calculations.
fog37 said:
What about the average of the ##Y## values for the same ##X## value (assuming there is more than just one ##Y## value for each ##X## value)? These two averages should always be numerically close, correct?
No. That is too strong a statement. In the 100 data points that you collected for your sample, there might be values of ##X## where that sample happened to be off. In fact, with only 100 samples, if you collected data at 10 ##X## values, you can expect the ##Y## average of some of those ##X## value sets to be off more than others.
 
Last edited:

FAQ: Regression line with zero slope and average as best prediction

What is a regression line with zero slope?

A regression line with zero slope is a horizontal line on a graph that represents the mean (average) value of the dependent variable. This indicates that there is no relationship between the independent and dependent variables, meaning changes in the independent variable do not affect the dependent variable.

When is the average considered the best prediction in regression analysis?

The average is considered the best prediction in regression analysis when the slope of the regression line is zero. This situation arises when the independent variable does not provide any useful information to predict the dependent variable, making the mean of the dependent variable the most reliable predictor.

How do you interpret a regression line with zero slope in terms of correlation?

A regression line with zero slope indicates that there is no linear correlation between the independent and dependent variables. The correlation coefficient in this case would be zero, signifying no linear relationship between the variables.

What are the implications of using the average as the best prediction?

Using the average as the best prediction implies that the independent variable does not explain any variability in the dependent variable. Predictions will always be the mean of the dependent variable, regardless of the value of the independent variable. This simplifies the model but also indicates that the variable being used for prediction is not useful.

Can a regression line with zero slope still be useful?

While a regression line with zero slope might suggest that the independent variable is not useful for predicting the dependent variable, it can still be useful for understanding the data. It highlights that the dependent variable does not change with the independent variable, which can be important information in certain contexts. Additionally, it sets a baseline for comparing other models that might include different or additional predictors.

Back
Top