Performance of students - Hypothesis testing

In summary, a teacher wants to determine if the order of exam tasks has an impact on student performance. Two versions of the exam are created, randomly distributed to different groups of students. The null hypothesis is that the expected scores for the two versions are equal. Using the F-test and t-test, the null hypothesis is rejected with a significance level of 5%. For a significance level of 1%, the null hypothesis is not rejected if the sample size is at least 44.
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

A teacher wants to find out if the order of the exam tasks has an impact on the performance of the students. Therefore, he creates two versions ($ X $ and $ Y $) of an exam in which the exam tasks are arranged differently. The versions are randomly distributed so that $ n $ students receive version $ X $, and $ m = n $ receive version $ Y $ from them. We call the expected score at $ X $ with $ \mu_X $, and the expected score at $ Y $ with $ \mu_Y $. The variances are denoted $ \sigma_X^2 $ and $ \sigma_Y^2 $; it is assumed normal distribution. (a) Formulate a suitable null hypothesis for the question of the teacher.
(b) Consider that $n = 30, \overline{X} = 79, \overline{Y}= 74, S_X' = 14, S_Y' = 20$. Check the null hypothesis of (a) with significance level $\alpha=5\%$.
(c) Consider that $\overline{X} = 79, \overline{Y}= 80, S_X' = 14, S_Y' = 20$. For which sample size $n$ can we reject the null hypothesis with significance level $\alpha=1\%$ ? I have done the following:

(a) The null hypothesis is $H_0: \mu_X=\mu_Y$, right? (Wondering) (b) Since we don't know if we have the same or different variances, we have to test if we have the same $\sigma$ with a F-test.
  • If $\sigma_x=\sigma_y$ then we apply a two-samples t-test.
  • If $\sigma_x<\sigma_y$ then we apply a Welch-Test

The test is the following:

The null hypothesis and the alternative hypothesis is $H_0:\sigma_Y^2=\sigma_X^2$ and $H_1:\sigma_Y^2>\sigma_X^2$, respectively.

The test statistic is \begin{equation*}F=\frac{{S_Y'}^2}{{S_X'}^2}=\frac{20^2}{14^2}=\frac{400}{196}\approx 2.0408\end{equation*}
$F$ is F-distributed with degres of freedom $\nu_Y=n_Y-1=30-1=29$, $\nu_X=n_X-130-1=29$.

We have that $1-\alpha=95\%$.

The null hypothesis will be rejected if $F>F_{1-\alpha}(\nu_Y, \nu_X)=F_{0.95}(29, 29)$.

It holds that $F_{0.95}(29, 29)=1,86$.

Since $F=2.0408>1.86=F_{0.95}(29, 29)$, we reject the null hypothesis. So, we apply a Welch-Test. The zero-hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1:\mu_X-\mu_Y\neq 0$.

The test statistic $T$ for the t-Test with unknown variances \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}}}=\frac{79-74}{\sqrt{\frac{14^2}{30}+\frac{20^2}{30}}}=\frac{5}{\sqrt{\frac{196}{30}+\frac{400}{30}}}=\frac{5}{\sqrt{\frac{298}{15}}}\approx 1.1218\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}$.

The number od degrees of freedom is\begin{align*}k&=\frac{\left (\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}\right )^2}{\frac{1}{n_X-1}\left (\frac{{S_X'}^2}{n_X}\right )^2+\frac{1}{n_Y-1}\left (\frac{{S_Y'}^2}{n_Y}\right )^2}=\frac{\left (\frac{14^2}{30}+\frac{20^2}{30}\right )^2}{\frac{1}{30-1}\left (\frac{14^2}{30}\right )^2+\frac{1}{30-1}\left (\frac{20^2}{30}\right )^2}=\frac{\left (\frac{196}{30}+\frac{400}{30}\right )^2}{\frac{1}{29}\left (\frac{196}{30}\right )^2+\frac{1}{29}\left (\frac{400}{30}\right )^2} \\ & =\frac{\left (\frac{596}{30}\right )^2}{\frac{1}{29}\left (\frac{38416}{900}+\frac{160000}{900}\right )}=\frac{\frac{355216}{900}}{\frac{1}{29}\cdot \frac{198416}{900}}=\frac{355216\cdot 29}{ 198416}=\frac{10301264}{ 198416}\approx 51.9175\end{align*} so $k=52$.

So we get the critical value $t_{k;1-\alpha/2}=t_{52;0.975}=1.67$.

Since $|T|=1.1218<1.67=t_{52;0.975}$ we don't reject the null hypothesis. Is everything correct? (Wondering)


(c) Do we have to do the same as in (b) just with unknown n? (Wondering)
 
Physics news on Phys.org
  • #2
mathmari said:
The number od degrees of freedom is $k\approx 51.9175$ so $k=52$.

Hey mathmari!

Just a nitpick. Generally we round degrees of freedom down, so I believe we should pick $k=51$.
That's because we want to be sure with a confidence of 'at least' $1-\alpha$ before we reject the null hypothesis.
In case of doubt, we can't.
So we should round to the safe side. (Nerd)
mathmari said:
So we get the critical value $t_{k;1-\alpha/2}=t_{52;0.975}=1.67$.

Since $|T|=1.1218<1.67=t_{52;0.975}$ we don't reject the null hypothesis. Is everything correct? (Wondering)


(c) Do we have to do the same as in (b) just with unknown n? (Wondering)

Yep. Yep. (Nod)
 
  • #3
I like Serena said:
Just a nitpick. Generally we round degrees of freedom down, so I believe we should pick $k=51$.
That's because we want to be sure with a confidence of 'at least' $1-\alpha$ before we reject the null hypothesis.
In case of doubt, we can't.
So we should round to the safe side. (Nerd)

Ah ok, I understand! (Nerd)
I like Serena said:
Yep. Yep. (Nod)

We have the following at (c) :

We check again with an F-test if the variances are equal. Or is it not neccesary and it holds the same as at (b) ? (Wondering)

The F-test would be the following:

The null hypothesis is $H_0:\sigma_Y^2=\sigma_X^2$ and the alternative hypothesis is $H_1:\sigma_Y^2>\sigma_X^2$.

The test statistic is \begin{equation*}F=\frac{{S_Y'}^2}{{S_X'}^2}=\frac{20^2}{14^2}=\frac{400}{196}\approx 2.0408\end{equation*}
$F$ is F-distributed with degress of freedom $\nu_Y=\nu_X=n-1$.

We have that $1-\alpha=99\%$.

The null hypothesis will be rejected if $F>F_{1-\alpha}(\nu_Y, \nu_X)=F_{0.99}(n-1, n-1)$.

How can we determine $F_{0.99}(n-1, n-1)$ without knowing $n$ ? (Wondering)
 
  • #4
We already know that we'll need a bigger $n$ than we had for (b) don't we?

Let's inspect the F-table with the smaller $\alpha$ and with the same $F$-value (since the variances are the same).
What happens if we increase the degrees of freedom of both the numerator and the denominator?
Is there a possibility that we can assume equal variances after all? (Wondering)
 
  • #5
I like Serena said:
We already know that we'll need a bigger $n$ than we had for (b) don't we?

Let's inspect the F-table with the smaller $\alpha$ and with the same $F$-value (since the variances are the same).
What happens if we increase the degrees of freedom of both the numerator and the denominator?
Is there a possibility that we can assume equal variances after all? (Wondering)

So we have to check for which n we have at this table. For $n-1\geq 30$ do we nit get values smaller than $F=2.0408$ and so the null hypothesis is rejected, or not?

So we have to apply again a Welch-test.

Or am I wrong? (Wondering)
 
  • #6
mathmari said:
So we have to check for which n we have at this table. For $n-1\geq 30$ do we nit get values smaller than $F=2.0408$ and so the null hypothesis is rejected, or not?

So we have to apply again a Welch-test.

Or am I wrong? (Wondering)

There seem to be mistakes in that table. For instance $F_{0.99}(31,31)=1.98$ is lower than the values to the left and right of it, which is not possible. (Worried)
I think we should use another table.

In R we can do:
Code:
> qf(0.99, 43:45, 43:45)
[1] 2.056934 2.039508 2.022824
So for $n-1\ge 44$ the critical $F$-values are below our $F=2.0408$, so we will have to reject the null hypothesis for those $n$, and apply the Welch-test. (Thinking)
 
  • #7
I like Serena said:
There seem to be mistakes in that table. For instance $F_{0.99}(31,31)=1.98$ is lower than the values to the left and right of it, which is not possible. (Worried)
I think we should use another table.

Ah ok!
I like Serena said:
In R we can do:
Code:
> qf(0.99, 43:45, 43:45)
[1] 2.056934 2.039508 2.022824
So for $n-1\ge 44$ the critical $F$-values are below our $F=2.0408$, so we will have to reject the null hypothesis for those $n$, and apply the Welch-test. (Thinking)
We want for which $n$ the null hypothesis of (a) can be rejected.
So do we have to take cases and find $n$ if $\sigma_x=\sigma_y$, i.e. with a two-samples t-test and also if $\sigma_x<\sigma_y$, i.e. with a Welch-test?

(Wondering)
 
  • #8
mathmari said:
We want for which $n$ the null hypothesis of (a) can be rejected.
So do we have to take cases and find $n$ if $\sigma_x=\sigma_y$, i.e. with a two-samples t-test and also if $\sigma_x<\sigma_y$, i.e. with a Welch-test?

Yep. We can do that.
So for $n-1 < 44$ we should assume equal variances, find the critical $n$, and verify that it indeed satisfies $n-1 < 44$.
And for $n-1 \ge 44$ we should assume unequal variances, find the critical $n$, and verify that it indeed satisfies $n-1 \ge 44$. (Thinking)
 
  • #9
For $n-1\ge 44$ we apply the Welch-Test.

The null hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1:\mu_X-\mu_Y\neq 0$.

The test statistic is $T$ for the t-test with unknown variances is \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}}}=\frac{79-80}{\sqrt{\frac{14^2}{n}+\frac{20^2}{n}}}=\frac{1}{\sqrt{\frac{196}{n}+\frac{400}{n}}}=\frac{1}{\sqrt{\frac{596}{n}}}=\frac{\sqrt{n}}{2\sqrt{149}}\geq \frac{\sqrt{45}}{2\sqrt{149}}\approx 0.2748\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}$.

The degree of freedom is \begin{align*}k&=\frac{\left (\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}\right )^2}{\frac{1}{n_X-1}\left (\frac{{S_X'}^2}{n_X}\right )^2+\frac{1}{n_Y-1}\left (\frac{{S_Y'}^2}{n_Y}\right )^2}=\frac{\left (\frac{14^2}{n}+\frac{20^2}{n}\right )^2}{\frac{1}{n-1}\left (\frac{14^2}{n}\right )^2+\frac{1}{n-1}\left (\frac{20^2}{n}\right )^2}=\frac{\left (\frac{196}{n}+\frac{400}{n}\right )^2}{\frac{1}{n-1}\left (\frac{196}{n}\right )^2+\frac{1}{n-1}\left (\frac{400}{n}\right )^2} \\ & =\frac{\left (\frac{596}{n}\right )^2}{\frac{1}{n-1}\left (\frac{38416}{n^2}+\frac{160000}{n^2}\right )}=\frac{\frac{355216}{n^2}}{\frac{1}{n-1}\cdot \frac{198416}{n^2}}=\frac{355216\cdot (n-1)}{ 198416}\geq \frac{355216\cdot 44}{ 198416}=78.7714\end{align*} so $k=78$.

The critical value is therefore $t_{k;1-\alpha/2}=t_{78;0.995}=2.375$.

How can we compare $|T|\geq 0.2748$ and $t_{78;0.995}=2.375$ where we have inequalities? (Wondering)
 
  • #10
They are not inequalities if we pick a specific n.
In fact we have found that for n=45, we cannot reject H0.
We will need a bigger n.
How about n=100? Or n=1000? (Wondering)
 
  • #11
I like Serena said:
They are not inequalities if we pick a specific n.
In fact we have found that for n=45, we cannot reject H0.
We will need a bigger n.
How about n=100? Or n=1000? (Wondering)

Ahh.. We reject the null hypothesis if $$|T|>t_{78;0.995}\Rightarrow \frac{\sqrt{n}}{2\sqrt{149}}>2.375\Rightarrow n>3361.81$$ right? (Wondering)
 
  • #12
mathmari said:
Ahh.. We reject the null hypothesis if $$|T|>t_{78;0.995}\Rightarrow \frac{\sqrt{n}}{2\sqrt{149}}>2.375\Rightarrow n>3361.81$$ right? (Wondering)

If n is bigger, doesn't the degrees of freedom k also become bigger? (Wondering)
Then the critical t-value becomes smaller until it approaches the critical z-value. Doesn't it?
 
  • #13
I like Serena said:
If n is bigger, doesn't the degrees of freedom k also become bigger? (Wondering)
Then the critical t-value becomes smaller until it approaches the critical z-value. Doesn't it?

So, for big degree of freedom the crtitical t-value approximates the critical z-value and so $t_{k;0.995}\approx z_{0.995}= 2.575$ ? (Wondering)
 
  • #14
mathmari said:
So, for big degree of freedom the crtitical t-value approximates the critical z-value and so $t_{k;0.995}\approx z_{0.995}= 2.575$ ? (Wondering)

Ah. You already had the critical z-value! (Blush)
Then it's all correct.
 
  • #15
For $n-1\ge 44$ we apply the Welch-test.

The null hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1: \mu_X-\mu_Y\neq 0$.

The test statistic $T$ for the t-test is \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{S_X'^2}{n_X}+\frac{S_Y'^2}{n_Y}}}=\frac{79-80}{\sqrt{\frac{14^2}{n}+\frac{20^2}{n}}}=\frac{-1}{\sqrt{\frac{596}{n}}}=-\frac{\sqrt{n}}{\sqrt{596}}\approx -0.04096\sqrt{n}\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}=t_{k;0.995}$.

From $n\geq 30$ the t-distribution can be approximated by the normal distribution.

Since this holds in this case, $n\geq 45$, we have that $t_{k;0.995}\approx z_{0.995}=2.575$.

Therefore, so that the null hypothesis is rejected it must hold the following: \begin{equation*}|T|>t_{k;0.995}\Rightarrow 0.04096\sqrt{n}>2.575 \Rightarrow n>3952.16\end{equation*}
So, the null hypothesis will be rejected for a sample of size $n\geq 3953$. Is everything correct? (Wondering)
Let's consider the case $n-1<44$. We apply here a two-samples t-test.

The test statistic is $T=\frac{\overline{X}-\overline{Y}}{S\cdot \sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}}$ with $S=\sqrt{\frac{(n_X-1)S_X^2+(n_Y-1)S_Y^2}{n_X+n_Y-2}}$, right?

So, we have that $S=\sqrt{\frac{(n-1)14^2+(n-1)20^2}{n+n-2}}=\sqrt{\frac{(n-1)196+(n-1)400}{2n-2}}=\sqrt{\frac{596(n-1)}{2(n-1)}}=\sqrt{\frac{596}{2}}\approx 17.2627$.

Thereforee we get $T=\frac{79-80}{17.2627\cdot \sqrt{\frac{1}{n}+\frac{1}{n}}}=\frac{-\sqrt{n}}{17.2627\cdot \sqrt{2}}=-0.0409615 \sqrt{n}$.

How could we deterine here $t_{k;0.995}$? We cannot approximate the t-distribution by a normal distribution for $n<30$.

(Wondering)
 
  • #16
mathmari said:
For $n-1\ge 44$ we apply the Welch-test.
...
So, the null hypothesis will be rejected for a sample of size $n\geq 3953$.

Is everything correct?

It looks correct to me. (Nod)

mathmari said:
Let's consider the case $n-1<44$. We apply here a two-samples t-test.

The test statistic is $T=\frac{\overline{X}-\overline{Y}}{S\cdot \sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}}$ with $S=\sqrt{\frac{(n_X-1)S_X^2+(n_Y-1)S_Y^2}{n_X+n_Y-2}}$, right?

So, we have that $S=\sqrt{\frac{(n-1)14^2+(n-1)20^2}{n+n-2}}=\sqrt{\frac{(n-1)196+(n-1)400}{2n-2}}=\sqrt{\frac{596(n-1)}{2(n-1)}}=\sqrt{\frac{596}{2}}\approx 17.2627$.

Thereforee we get $T=\frac{79-80}{17.2627\cdot \sqrt{\frac{1}{n}+\frac{1}{n}}}=\frac{-\sqrt{n}}{17.2627\cdot \sqrt{2}}=-0.0409615 \sqrt{n}$.

How could we deterine here $t_{k;0.995}$? We cannot approximate the t-distribution by a normal distribution for $n<30$.

(Wondering)

How about checking every $n$ between $1$ and $44$?
Maybe we can find a pattern so that we have to check fewer values. (Wondering)
 
  • #17
I like Serena said:
How about checking every $n$ between $1$ and $44$?
Maybe we can find a pattern so that we have to check fewer values. (Wondering)

As I read now, the null hypothesis is rejected if $|T|>t_{1-\alpha/2;n_X+n_Y-2}$, isn't it?

Then we have the ollowing:
$$|T|>t_{1-\alpha/2;n_X+n_Y-2}\Rightarrow |T|>t_{0.995;n+n-2}\Rightarrow 0.0409615 \sqrt{n}>t_{0.995;2n-2}$$

To check that for every $n$ between $1$ and $44$ (i.e. for every $2n-2$ between $0$ and $86$) using the R-compiler do we write [m]qt(0.01, 0 : 86)[/m] ? If yes, we get only negativ values, and that would mean that the above holds for every $n$.

(Wondering)
 
  • #18
mathmari said:
As I read now, the null hypothesis is rejected if $|T|>t_{1-\alpha/2;n_X+n_Y-2}$, isn't it?

Then we have the ollowing:
$$|T|>t_{1-\alpha/2;n_X+n_Y-2}\Rightarrow |T|>t_{0.995;n+n-2}\Rightarrow 0.0409615 \sqrt{n}>t_{0.995;2n-2}$$

To check that for every $n$ between $1$ and $44$ (i.e. for every $2n-2$ between $0$ and $86$) using the R-compiler do we write [m]qt(0.01, 0 : 86)[/m] ? If yes, we get only negativ values, and that would mean that the above holds for every $n$.

(Wondering)

Shouldn't we check [m]qt(0.995, 0 : 86)[/m]? (Wondering)
 
  • #19
I like Serena said:
Shouldn't we check [m]qt(0.995, 0 : 86)[/m]? (Wondering)

Oh yes (Blush)

So, at the left side of the inequation the lergst number that we get, i.e. for $n=44$, is about $0.271708$. At the right side every number is greater than $2.634212$.
So that inequality doesn't hold for any $n$.

Is this correct? (Wondering)
 
  • #20
mathmari said:
Oh yes (Blush)

So, at the left side of the inequation the lergst number that we get, i.e. for $n=44$, is about $0.271708$. At the right side every number is greater than $2.634212$.
So that inequality doesn't hold for any $n$.

Is this correct? (Wondering)

Looks correct to me. (Nod)
 
  • #21
I like Serena said:
Looks correct to me. (Nod)

Great! Thank you so much! (Yes)
 

FAQ: Performance of students - Hypothesis testing

What is hypothesis testing?

Hypothesis testing is a statistical method used to determine whether a hypothesis about a population is supported by the data collected from a sample. It involves setting up a null hypothesis and an alternative hypothesis, collecting data, and using statistical tests to determine the likelihood of the data supporting the alternative hypothesis over the null hypothesis.

How does hypothesis testing relate to performance of students?

Hypothesis testing can be used to determine whether there is a significant difference in the performance of students between different groups or conditions. For example, it can be used to compare the performance of students who received a new teaching method versus those who did not.

What is a p-value and why is it important in hypothesis testing?

A p-value is the probability of obtaining the observed results or more extreme results if the null hypothesis is true. It is important in hypothesis testing because it helps determine the significance of the results. A p-value below a certain threshold (usually 0.05) indicates that the data supports the alternative hypothesis and the results are statistically significant.

How do you choose the appropriate statistical test for hypothesis testing?

The choice of statistical test depends on the type of data and the research question being investigated. Some common statistical tests used in hypothesis testing for student performance include t-tests, ANOVA, and regression analysis. It is important to consult with a statistician or refer to statistical textbooks to determine the most appropriate test for a particular research question.

What are some potential limitations of using hypothesis testing to evaluate student performance?

One limitation of using hypothesis testing is that it can only determine whether there is a statistically significant difference between groups or conditions. It cannot prove causation or explain the reasons for the observed differences in performance. Additionally, the results of hypothesis testing may be influenced by various factors such as sample size, type I and type II errors, and the choice of statistical test used.

Similar threads

Replies
5
Views
378
Replies
1
Views
892
Replies
10
Views
3K
Replies
5
Views
2K
Replies
1
Views
906
Replies
1
Views
1K
Back
Top