Confidence Interval for Child's Weight Based on Television Watching

In summary, We discussed a study that looked at the number of hours of television watched per week and the weight of children, and calculated a simple linear regression equation for the data. We also performed significance tests for the slope of the regression line and the criterion F, and determined confidence intervals for the average weight of children who watch different amounts of television. We also discovered a mistake in the initial calculations and found that the slope was significantly different from zero. Finally, we defined the criterion F as a test for whether two variances are different.
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

In a study at $15$ children at the age of $10$ years the number of hours of television watching per week and the pounds above or below the ideal body weight were determined (high positive values ​​= overweight).

  1. Determine the simple linear regression equation by considering the weights above the ideal body weight as a dependent variable.
  2. Perform a significance test for the slope of the regression line at significance level $\alpha = 5\%$ (using p-values).
  3. Perform a significance test of the criterion F at significance level $\alpha = 0.05$ (using p-values).
  4. Determine the confidence interval for the average weight in pounds for a child who watches television for $36$ hours a week and for a child who watches television for $30$ hours a week. Which confidence interval is greater and why?
I have done the following:

  1. At the beginning I calculated the following:

    View attachment 9480

    Using these information we get:
    \begin{align*}&\nu =15 \\ &\overline{X}=\frac{\sum X}{\nu}=\frac{472}{15}=31.47 \\ &\overline{Y}=\frac{\sum Y}{\nu}=\frac{86}{15}=5.73 \\ &\hat{\beta}=\frac{\nu \sum \left (XY\right )-\left (\sum X\right )\left (\sum Y\right )}{\nu\sum X^2-\left (\sum X\right )^2}=\frac{15 \cdot 3356-472\cdot 86}{15\cdot 15524-472^2}=\frac{50340-40592}{232860-222784}=\frac{9748}{10076}=0.97 \\ & \hat{\alpha}=\overline{Y}-\hat{\beta}\cdot \overline{X}=5.73-0.97\cdot 31.47=5.73-30.5259=-24.80\end{align*}

    Therefore the linear regression equation with dependent variable the kg over the ideal weights is: \begin{equation*}\hat{Y}=0.97X-24.80\end{equation*}

    The graph looks as follows:

    View attachment 9482
  2. We want to test the null hypothesis that the slope of the regression line is $0$.

    I found some notes and according to these I did the following:

    View attachment 9481 Since p-value < α (or |t| > t-crit) we reject the null hypothesis, and so we can’t conclude that the population slope is zero.

    Is this correct? (Wondering)

    But, according to these calculations we get an other slope than I got in the first question, or not? Here we have $b=0,91$ and in the first question I got $\hat{\beta}=0,97$.
    So have I done something wrong at the calculation of the linear regression equation? (Wondering)
 

Attachments

  • reg_table.JPG
    reg_table.JPG
    32.2 KB · Views: 87
  • t_test.JPG
    t_test.JPG
    69.1 KB · Views: 100
  • dias.JPG
    dias.JPG
    77.8 KB · Views: 90
Physics news on Phys.org
  • #2
mathmari said:
[*] We want to test the null hypothesis that the slope of the regression line is $0$.

Hey mathmari!

Let's rephrase that... we want to test the alternative hypothesis that the slope of the regression line is not $0$. (Nerd)

mathmari said:
I found some notes and according to these I did the following:

Since p-value < α (or |t| > t-crit) we reject the null hypothesis, and so we can’t conclude that the population slope is zero.

Is this correct?

Since we have a 2-sided test we need to compare the p-value with α/2.
If it is below - and see below for an apparent calculation mistake - then we conclude that the slope is significantly different from zero.
Or put otherwise, that there is a significant linear correlation between X and Y.
Note that we can never conclude that the population slope is 0. At best we do not have sufficient information to conclude that it is different. (Nerd)

mathmari said:
But, according to these calculations we get an other slope than I got in the first question, or not? Here we have $b=0,91$ and in the first question I got $\hat{\beta}=0,97$.
So have I done something wrong at the calculation of the linear regression equation?

Looks as if there is a mistake.
I get different values for s_X and s_Y. I have s_X=6.92 and s_Y=7.648.
Perhaps the excel range was not set correctly? (Wondering)
 
  • #3
Klaas van Aarsen said:
Since we have a 2-sided test we need to compare the p-value with α/2.
If it is below - and see below for an apparent calculation mistake - then we conclude that the slope is significantly different from zero.
Or put otherwise, that there is a significant linear correlation between X and Y.
Note that we can never conclude that the population slope is 0. At best we do not have sufficient information to conclude that it is different. (Nerd)

So do we have the following? (Wondering)

Since p-value < α/2 (or |t| > t-crit) we reject the null hypothesis, and so we conclude that the slope is significantly different from zero.
Klaas van Aarsen said:
Looks as if there is a mistake.
I get different values for s_X and s_Y. I have s_X=6.92 and s_Y=7.648.
Perhaps the range was not set correctly? (Wondering)

Ah yes, I found my mistake at the commands at Excel.

Now I get:

View attachment 9484So now it is the same slope as I found in the first question! (Whew)
 

Attachments

  • tvh_kg.JPG
    tvh_kg.JPG
    67.9 KB · Views: 89
  • #4
mathmari said:
So do we have the following?

Since p-value < α/2 (or |t| > t-crit) we reject the null hypothesis, and so we conclude that the slope is significantly different from zero.

I've just noticed that you've used [M]=TDIST(x, df, tails=2)[/M] to calculate the p-value. If I'm not mistaken it means that the factor 2 has already been taken care of so that we can compare the p-value and α directly. :eek:

And yes, we conclude that the slope is significantly different from zero. (Nod)

mathmari said:
Ah yes, I found my mistake at the commands at Excel.
Now I get:
So now it is the same slope as I found in the first question! (Whew)

Good! (Handshake)
 
  • #5
Klaas van Aarsen said:
I've just noticed that you've used [M]=TDIST(x, df, tails=2)[/M] to calculate the p-value. If I'm not mistaken it means that the factor 2 has already been taken care of so that we can compare the p-value and α directly. :eek:

And yes, we conclude that the slope is significantly different from zero. (Nod)
Good! (Handshake)
Great!

Could you give me a hint for the question 3? What exactly is the criterion F? (Wondering)
 
  • #6
mathmari said:
Great!

Could you give me a hint for the question 3? What exactly is the criterion F?

You have just executed a t-test to test whether the slope is different from 0.
As I understand it, we can also do an F-test for the same thing.
An F-test tests whether 2 variances are different. The F-value is the ratio between those 2 variances. (Thinking)
 
  • #7
Klaas van Aarsen said:
You have just executed a t-test to test whether the slope is different from 0.
As I understand it, we can also do an F-test for the same thing.
An F-test tests whether 2 variances are different. The F-value is the ratio between those 2 variances. (Thinking)

I used in Excel the "F-Test for the variances of two samples" and I got the following: View attachment 9485
Is this correct, i.e. did I give the correct inputs? (Wondering)
 

Attachments

  • f_test.JPG
    f_test.JPG
    42.7 KB · Views: 102
  • #8
mathmari said:
I used in Excel the "F-Test for the variances of two samples" and I got the following:

Is this correct, i.e. did I give the correct inputs?

I don't think so.
It appears you have compared the variances of the inputs and the outputs.
But that does not really say whether they are correlated or not does it? (Worried)

Perhaps we should search for what kind of F-test we can do within the context of a linear regression.
It should compare the 'explained' variance with the 'unexplained' variance. (Thinking)
 
  • #9
Klaas van Aarsen said:
Perhaps we should search for what kind of F-test we can do within the context of a linear regression.
It should compare the 'explained' variance with the 'unexplained' variance. (Thinking)
The explained variance is the sum of the squared of the differences between each predicted Y-value and the mean of Y.

The unexplained variance is the sum of the squared of the differences between the Y-value of each ordered pair and each corresponding predicted Y-value.

Right? (Wondering) Is the F-value the fraction of these two values?

If yes, then we have the following:
\begin{equation*}F=\frac{\text{explained variance}}{\text{unexplained variance}}=\frac{632.0347}{190.2276}=3.32252\end{equation*}

(Wondering)
 
  • #10

Attachments

  • f_value.JPG
    f_value.JPG
    50.3 KB · Views: 98
  • #11
mathmari said:
The explained variance is the sum of the squared of the differences between each predicted Y-value and the mean of Y.

The unexplained variance is the sum of the squared of the differences between the Y-value of each ordered pair and each corresponding predicted Y-value.

Right?

Those are the sum-squared values, typically abbreviated as SSM and SSE.
To find the variances we still need to divide by the corresponding degrees-of-freedom (DFM and DFE) don't we? (Wondering)

mathmari said:
Is the F-value the fraction of these two values?

If yes, then we have the following:
\begin{equation*}F=\frac{\text{explained variance}}{\text{unexplained variance}}=\frac{632.0347}{190.2276}=3.32252\end{equation*}

Yes, the F-value is that fraction.
But I think the numbers for the variances are not correct yet. (Worried)
 
  • #12
Klaas van Aarsen said:
Those are the sum-squared values, typically abbreviated as SSM and SSE.
To find the variances we still need to divide by the corresponding degrees-of-freedom (DFM and DFE) don't we? (Wondering)Yes, the F-value is that fraction.
But I think the numbers for the variances are not correct yet. (Worried)

Oh ok!

So we have that DFM = p - 1, where p is the number of regression parameters, which is 2 in this case, and so we get DFM = 2-1=1, or not?

We also have that DFE = n - p, where n is the number of observations, and so we get DFE = 15 - 2 =13, or not?

(Wondering)
 
  • #13
mathmari said:
Oh ok!

So we have that DFM = p - 1, where p is the number of regression parameters, which is 2 in this case, and so we get DFM = 2-1=1, or not?

We also have that DFE = n - p, where n is the number of observations, and so we get DFE = 15 - 2 =13, or not?

Yep. (Nod)
 
  • #14
Klaas van Aarsen said:
Yep. (Nod)

So using the table of post #10 we get
\begin{align*}&SSM=632.0347 \\ &DFM=2-1=1 \\ &SSE=190.2276 \\ &DFE=15-2=13 \\ &MSM=\frac{SSM}{DFM}=\frac{632.0347}{1}=632.0347 \\ &MSE=\frac{SSE}{SFE}=\frac{190.2276}{13}=14.6329 \\ &F=\frac{MSM}{MSE}=\frac{632.0347}{14.6329}=43.1927\end{align*}

Now we have to find the confidence interval for the test statistic with $\alpha=0.05$, right? We look in the F-table at the $0.05$ entry for $1$ df in the numerator and $13$ df in the denominator.

Using the R program and compiling the function qf(0.95, 1, 13) we get 4.667193.

Is so far everything correct?

How is the confidence interval defined with these data? (Wondering)
 
  • #15
mathmari said:
So using the table of post #10 we get
\begin{align*}&SSM=632.0347 \\ &DFM=2-1=1 \\ &SSE=190.2276 \\ &DFE=15-2=13 \\ &MSM=\frac{SSM}{DFM}=\frac{632.0347}{1}=632.0347 \\ &MSE=\frac{SSE}{SFE}=\frac{190.2276}{13}=14.6329 \\ &F=\frac{MSM}{MSE}=\frac{632.0347}{14.6329}=43.1927\end{align*}

Now we have to find the confidence interval for the test statistic with $\alpha=0.05$, right? We look in the F-table at the $0.05$ entry for $1$ df in the numerator and $13$ df in the denominator.

Using the R program and compiling the function qf(0.95, 1, 13) we get 4.667193.

Is so far everything correct?

I have found the F-value 42.967. That is more or less the same F-value. Good.
The difference is probably caused by early rounding.

And you have found a critical F-value.
But shouldn't we find a p-value to compare with $\alpha$? And draw a conclusion? (Wondering)

mathmari said:
How is the confidence interval defined with these data?

For the F-test you mean?
The F-test is a 1-sided test in this case, and generally a confidence interval belongs to a 2-sided test.
So I don't think we should calculate a confidence interval in this case. (Thinking)
 
  • #16
Klaas van Aarsen said:
I have found the F-value 42.967. That is more or less the same F-value. Good.
The difference is probably caused by early rounding.

And you have found a critical F-value.
But shouldn't we find a p-value to compare with $\alpha$? And draw a conclusion? (Wondering)

So shouldn't I have calculated that F value? How do we calculate the p value? (Wondering)
 
  • #17
mathmari said:
So shouldn't I have calculated that F value? How do we calculate the p value?

You found a formula in R to calculate the critical F-value from $\alpha$.
Isn't there a simular formula to calculate the p-value from the F-value? (Wondering)
 
  • #18
Klaas van Aarsen said:
You found a formula in R to calculate the critical F-value from $\alpha$.
Isn't there a simular formula to calculate the p-value from the F-value? (Wondering)

Using the function pf(42.967, 1, 13, lower.tail=F) we get 1.839458e-05.

Is the function correct? (Wondering)
 
  • #19
mathmari said:
Using the function pf(42.967, 1, 13, lower.tail=F) we get 1.839458e-05.

Is the function correct?

Yep.
Previously you used the t-test to find the p-value for the slope. Now we used the F-test. The result should be the same shouldn't it? Is it? (Wondering)
 
  • #20
Klaas van Aarsen said:
Yep.
Previously you used the t-test to find the p-value for the slope. Now we used the F-test. The result should be the same shouldn't it? Is it? (Wondering)

Ah yes, they are the same!

So we compare now the p-value ith $\alpha$, or not? Sodo we have the following? (Wondering)

Since p-value < α we reject the null hypothesis, and so we conclude that the slope is significantly different from zero. As for the question 4, how is the confidence interval defined, which formula do we use? (Wondering)
 
Last edited by a moderator:
  • #21
mathmari said:
Ah yes, they are the same!

So we compare now the p-value ith $\alpha$, or not? Sodo we have the following? (Wondering)

Since p-value < α we reject the null hypothesis, and so we conclude that the slope is significantly different from zero.

Yep. (Nod)

mathmari said:
As for the question 4, how is the confidence interval defined, which formula do we use?

We are looking for the confidence interval of a point estimate in a simple linear regression.
I found a formula here, here and here.
Wikipedia gives a confidence band formula for the same thing. (Thinking)
 
  • #22
Klaas van Aarsen said:
We are looking for the confidence interval of a point estimate in a simple linear regression.
I found a formula here, here and here.
Wikipedia gives a confidence band formula for the same thing. (Thinking)

So do we have the following? (Wondering)

View attachment 9499

That would mean that the confidence interval is $[7.541854251, \ 12.69633551]$.

Is that correct? (Wondering)
 

Attachments

  • con_int.JPG
    con_int.JPG
    54.2 KB · Views: 91
  • #23
mathmari said:
So do we have the following? (Wondering)
That would mean that the confidence interval is $[7.541854251, \ 12.69633551]$.

Is that correct? (Wondering)

I didn't check the numbers, but the approach seems to be correct.
Still, didn't the question ask for a child who watches television for 30 hours a week as well? And the corresponding confidence interval? (Wondering)
 
  • #24
Klaas van Aarsen said:
I didn't check the numbers, but the approach seems to be correct.
Still, didn't the question ask for a child who watches television for 30 hours a week as well? And the corresponding confidence interval? (Wondering)

For that we do the same just replacing the 36 hours by 30 hours, or not? (Wondering)
 
  • #25
mathmari said:
For that we do the same just replacing the 36 hours by 30 hours, or not?

I guess so, assuming your previous approach was correct which seems plausible. (Thinking)
 
  • #26
Klaas van Aarsen said:
I guess so, assuming your previous approach was correct which seems plausible. (Thinking)

Applying thesame methodas before I get that the confidence interval for the case of 30 hours is $[2.130030132, \ 6.498790828]$.

mathmari said:
Determine the confidence interval for the average weight in pounds for a child who watches television for $36$ hours a week and for a child who watches television for $30$ hours a week. Which confidence interval is greater and why?

By greater it is meant larger values not bigger width, right?

If yes, the greater confidence interval is the first one, for the case of 36 hours. How do we justify that? (Wondering)
 
  • #27
mathmari said:
Applying thesame methodas before I get that the confidence interval for the case of 30 hours is $[2.130030132, \ 6.498790828]$.

Looking at your graph in post #1, that looks about right. (Nod)

mathmari said:
By greater it is meant larger values not bigger width, right?

If yes, the greater confidence interval is the first one, for the case of 36 hours. How do we justify that?

I believe they mean a greater range of the confidence interval. The range is the upper bound minus the lower bound.
Either way, that is also the confidence interval of 36 hours.

The wiki article explains that the range of the confidence interval has 2 parts:
  1. The error due to uncertainty in estimated slope ($\hat\beta_1$) and y-intersection ($\hat\beta_0$). This error is the least close to the center, and grows bigger away from the center.
  2. The error due to scattering from unexplained sources, which is assumed to be normally distributed with equal variance everywhere.
They also show a picture with the confidence band that has a hyperbolic shape (unrelated to this problem):
View attachment 9502
As you can see, the band is narrowest at the mean X-value, and grows wider in both positive and negative directions.

And indeed, 30 hours is closer to the mean X-value of 31.47 than 36 hours. (Thinking)
 

Attachments

  • 300px-Okuns_law_with_confidence_bands.svg.png
    300px-Okuns_law_with_confidence_bands.svg.png
    4.7 KB · Views: 80

FAQ: Confidence Interval for Child's Weight Based on Television Watching

What is a significance test?

A significance test is a statistical method used to determine whether the results of a study or experiment are due to chance or if they are truly significant. It helps researchers make conclusions about a population based on a sample of data.

Why is a significance test important in scientific research?

Significance tests help scientists determine if their findings are statistically significant, meaning that the results are not likely to have occurred by chance. This is important because it allows researchers to confidently draw conclusions and make generalizations about a larger population.

What are the steps involved in performing a significance test?

The first step is to choose a null hypothesis, which states that there is no significant difference between groups or variables. Then, select an appropriate test statistic and calculate its value based on the data. Next, determine the probability of obtaining the observed results if the null hypothesis is true. Finally, compare the calculated probability to a predetermined significance level to determine if the results are statistically significant.

What is the difference between a one-tailed and two-tailed significance test?

A one-tailed significance test is used when the researcher has a specific hypothesis about the direction of the effect. In this case, the significance level is only calculated on one side of the distribution. A two-tailed significance test is used when the researcher does not have a specific hypothesis about the direction of the effect. In this case, the significance level is calculated on both sides of the distribution.

What are some common mistakes to avoid when performing a significance test?

Some common mistakes to avoid when performing a significance test include using an incorrect test statistic, using an inappropriate significance level, and misinterpreting the results. It is important to carefully select the appropriate test and significance level based on the research question and to correctly interpret the results in the context of the study.

Similar threads

Replies
5
Views
2K
Replies
1
Views
904
Replies
20
Views
3K
Replies
2
Views
2K
Replies
21
Views
2K
Replies
35
Views
27K
Back
Top