- #1
ych22
- 115
- 1
Let me assume that we are performing an experiment to build a regression model with independent variable X and dependent variable Y.
Then somehow, we have a choice between X=1,4,10,11,14 or X= 6,7,8,9,10. The mean of both sets of X's is 8, but the variance of the first set of X's is much higher than the latter set.
Which set of X's is better for:
A) Determining whether a regression relation exists.
B) Estimating the mean response of Y at X=8.
I think that the former set is better for determining whether a regression relation exists. Because the F* statistic in ANOVA is given by MSR/MSE. E[MSE]= [tex]\sigma[/tex]2while E[MSR]= [tex]\sigma[/tex]2 + [tex]\beta[/tex]12[tex]\sum[/tex](Xi-[tex]\overline{X}[/tex])2. When there is no relation between X and Y, then obviously the choice of X's does not matter. However when the relation exists, then E[MSR] is higher with higher variance of the X's. So F* is expected to be higher, and more likely to conclude that the relation exists.
However, I am not too sure which set of X is better for estimating the mean response of Y at X=8. Although the first set should be better for estimating the variability in the response of Y at X=8...
Then somehow, we have a choice between X=1,4,10,11,14 or X= 6,7,8,9,10. The mean of both sets of X's is 8, but the variance of the first set of X's is much higher than the latter set.
Which set of X's is better for:
A) Determining whether a regression relation exists.
B) Estimating the mean response of Y at X=8.
I think that the former set is better for determining whether a regression relation exists. Because the F* statistic in ANOVA is given by MSR/MSE. E[MSE]= [tex]\sigma[/tex]2while E[MSR]= [tex]\sigma[/tex]2 + [tex]\beta[/tex]12[tex]\sum[/tex](Xi-[tex]\overline{X}[/tex])2. When there is no relation between X and Y, then obviously the choice of X's does not matter. However when the relation exists, then E[MSR] is higher with higher variance of the X's. So F* is expected to be higher, and more likely to conclude that the relation exists.
However, I am not too sure which set of X is better for estimating the mean response of Y at X=8. Although the first set should be better for estimating the variability in the response of Y at X=8...