How to compare 2 means?

  • #1
Agent Smith
337
36
TL;DR Summary
Hypothesis testing by comparing 2 means.
We have a question where we're testing the hypothesis that a certain diet (call it diet A) causes weight loss. We get ##n_1## people who we put on diet A (treatment group) and another ##n_2## people who we keep on a normal diet (control group). We find that the mean weight loss in the treatment group is ##m_T## with a standard deviation ##s_T##. The mean weight loss in the control group is ##m_C## and standard deviation is ##s_C##.

##H_0##: Diet A is not effective i.e. ##m_T = m_C##
##H_1##: Diet A is effective i.e. ##m_T > m_C##

Assume all conditions for inference have been met

We "combine the distributions" (I don't know the appropriate word) and compute ##m_T - m_C##. This is the difference in the means of weight loss for the treatment and control groups. Correct?

We compute the standard deviation for ##m_T - m_C## like so: ##\sigma_{T - C} = \sqrt{\frac{s_T^2}{n_1} + \frac{s_C ^2}{n_2}}##. This is the standard deviaton of the sampling distribution of the difference in mean weight loss for the treatment and control groups. Correct?


We compute the z/t score ##z = \frac{\left(m_T - m_C\right) - 0}{\sigma_{T - C}}##

We then look up the p-value from a z/t table.
If the p-value ##\leq## alpha then we reject ##H_0## and accept ##H_1## and if the p-value > alpha, we fail to reject ##H_0##
 
Last edited:
Physics news on Phys.org
  • #2
In general, if ##p<\alpha## then you reject ##H_0##. Full stop. Rejecting ##H_0## is not the same as accepting ##H_1##. That is especially the case when ##H_0## and ##H_1## are not mutually exclusive and collectively exhaustive as is the case here.

In this case it is possible that the diet is anti-effective, meaning ##m_T<m_C## (let's call this ##H_2##). Now, you may look at your data and see that your sample ##\bar m_T>\bar m_C##, and you might incorrectly reason that your sample ##\bar m_T>\bar m_C## combined with ##p<\alpha## should entitle you to not just reject ##H_0## but also to accept ##H_1##. But the problem is that you have not accounted for the possibility that ##H_2## is correct (##m_T<m_C##)) but a sample randomly has ##\bar m_T>\bar m_C##.

To justify that you would actually have to calculate the probabilities of the data given ##H_1## and ##H_2##
 
  • #3
@Dale If ##m_T > m_C##, I don't think we have to worry about ##m_T < m_C##, no? I'm not sure.

Also what about my questions (in bold)? Are my conclusions correct?
 
  • #4
Agent Smith said:
@Dale If ##m_T > m_C##, I don't think we have to worry about ##m_T < m_C##, no? I'm not sure.
Even if the population ##m_T>m_C## it is possible that the sample ##\bar m_T<\bar m_C##. Since all you observe is the sample, you cannot rule the other possibility out.

Agent Smith said:
TL;DR Summary: Hypothesis testing by comparing 2 means.

This is the standard deviaton of the sampling distribution of the difference in mean weight loss for the treatment and control groups. Correct?
I believe so, but would have to look it up to be sure. There may be a correction for ##N-1## somewhere.
 
  • #5
Dale said:
Even if the population mT>mC it is possible that the sample m¯T<m¯C. Since all you observe is the sample, you cannot rule the other possibility out.
How would you guard against such errors?
 
  • #6
Agent Smith said:
How would you guard against such errors?
By computing probabilities related to ##H_1## directly, not just ##H_0##.

A big problem with traditional null hypothesis significance testing is that you compute only ##P(data|H_0)##. That makes it difficult to make assertions about any ##H_A##
 
Last edited:
  • #7
Agent Smith said:
TL;DR Summary: Hypothesis testing by comparing 2 means.

We "combine the distributions" (I don't know the appropriate word) and compute mT−mC. This is the difference in the means of weight loss for the treatment and control groups. Correct?
I don't know what you mean by "combining the distributions". You have to keep the T and C data sets separate to calculate mT and mC. The sentence in bold is correct.
Agent Smith said:
TL;DR Summary: Hypothesis testing by comparing 2 means.

This is the standard deviaton of the sampling distribution of the difference in mean weight loss for the treatment and control groups. Correct?
This is the estimate of the SD of the difference of the sample means. We don't know the true SD, and we have only done 1 experiment, so there is only one data point for difference of means, so we can't compute a sample SD.

Then comes the interesting question - how many degrees of freedom are associated with this estimate? This matters if you are doing a t-test; not if you are doing a z-test with large sample sizes. You may want to read up on "homoscedastic and heteroscedastic t-tests".
 
  • #8
mjc123 said:
I don't know what you mean by "combining the distributions". You have to keep the T and C data sets separate to calculate mT and mC. The sentence in bold is correct.
I assumed for given means ##\mu_T## and ##\mu_C## for 2 different distributions(?) or data sets, that ##\mu_T - \mu_C## represents a different data set with its own distribution. What exactly is a distribution?

mjc123 said:
This is the estimate of the SD of the difference of the sample means. We don't know the true SD, and we have only done 1 experiment, so there is only one data point for difference of means, so we can't compute a sample SD.
I guess we're trying to work with the sampling distribution of the sample means, where a sample mean = difference in the means (##\mu_T - \mu_C##). Correct?
##\mu_T - \mu_C## is also a mean, the mean of the difference between treatment and control measurements (weight differences). Correct?

Dale said:
By computing probabilities related to ##H_1## directly, not just ##H_0##.

A big problem with traditional null hypothesis significance testing is that you compute only ##P(data|H_0)##. That makes it difficult to make assertions about any ##H_A##
It's unclear but if the experiment results show that ##\mu_T > \mu_C##, my hypothesis would be that diet A causes weight loss. Then I would test this hypothesis, as described in the OP, no? If the associated p-value <= alpha then I would be required to reject ##H_0: \mu_T = \mu_C##. Isn't that how it works?

My ##H_a: \mu_T > \mu_C##. 🤔
 
  • #9
Agent Smith said:
It's unclear but if the experiment results show that ##\mu_T > \mu_C##, my hypothesis would be that diet A causes weight loss. Then I would test this hypothesis, as described in the OP, no? If the associated p-value <= alpha then I would be required to reject ##H_0: \mu_T = \mu_C##. Isn't that how it works?

My ##H_a: \mu_T > \mu_C##. 🤔
Shouldn’t ##H_a: \mu_T < \mu_C##?

So if ##\mu## indicates the population mean and ##m## indicates the sample mean. What you want to show is that

Agent Smith said:
Then I would test this hypothesis, as described in the OP, no?
The OP describes a standard test of ##H_0##. The OP does not describe any test of ##H_a##. That is precisely why the correct statement is that you “reject ##H_0##” and not that you “accept ##H_a##”.

Agent Smith said:
If the associated p-value <= alpha then I would be required to reject ##H_0: \mu_T = \mu_C##. Isn't that how it works?
Yes.
 
  • #10
Agent Smith said:
I assumed for given means μT and μC for 2 different distributions(?) or data sets, that μT−μC represents a different data set with its own distribution. What exactly is a distribution?
I think you're talking about comparing the means rather than combining the distributions. When you "compute mT - mC", you are not doing anything with distributions, just comparing two single values.

And let's get our notation clear. If μ denotes an underlying population mean, and m a measured sample mean, then the μ values are definite but unknown. The m values are measured, but we don't know how close they are to the μs. μT - μC does not have a distribution. mT - mC has a distribution with mean μT - μC.

Agent Smith said:
I guess we're trying to work with the sampling distribution of the sample means, where a sample mean = difference in the means (μT−μC). Correct?
No. The sample mean is the mean of the measurements for a sample, e.g. mT or mC.

Agent Smith said:
μT−μC is also a mean, the mean of the difference between treatment and control measurements (weight differences). Correct?
It is the difference between the underlying population means of T and C. It is also the mean of the distribution of the measured variable mT - mC.
 
  • #11
mjc123 said:
the μ values are definite but unknown. ... μT - μC does not have a distribution.
In frequentist statistics (which is what the OP is studying). In Bayesian statistics ##\mu## is a random variable with a probability distribution.
 
Last edited:
  • #12
@mjc123 and @Dale thank you for your response. I learned about "combining distributions/random variables". For example the distribution of finishing times of 2 teams in the Olympics. I guess we can use that to compare the performance of 2 teams. That's ##\mu_{X - Y}## and we also have a standard deviation that goes with that ##\sigma_{X - Y} = \sqrt{\sigma_X ^2 + \sigma_Y ^2}##. What is this exactly? I recall solving one such problem.
 
  • #13
If ##X## and ##Y## are random variables then any function of ##X## and ##Y## is also a random variable, such as ##X-Y##. As random variables ##X##, ##Y##, and ##X-Y## all have distributions. Usually you would speak of “combining distributions” in this sense.

Assuming that they are all well behaved random variables then they have expected values ##\mu_X##, ##\mu_Y##, and ##\mu_{X-Y}## respectively. In frequentist statistics, none of those are random variables. (They are random variables in Bayesian statistics, but you are studying frequentist statistics). Because they are not random variables you cannot combine distributions of ##\mu_X## or ##\mu_Y##.

Finally, you can repeatedly sample ##X## and ##Y## and get sample means ##\bar x## and ##\bar y##. These are also random variables with associated distributions. And as before any function of them, such as ##\bar x-\bar y## is also a random variable with its own distribution. You could speak of ##\bar x - \bar y## as “combining distributions”, but that would be less common to say.

Tests on ##X-Y## are called paired tests, and tests on ##\bar x- \bar y## are called unpaired tests. Generally paired tests are more powerful.
 
  • #14
@Dale oh, these were not taught to me at my level (Grade XII). The lessons are not as detailed, but the next stage is college level statistics. I hope most of what you guys and gals are expressing here will be at that stage.

I guess we wouldn't be "combining" distributions. Is "comparing" a better word?

Bits and pieces I can recall (I have a bad memory) are about combining means and computing standard deviations e.g. the average weight of a candle is ##\mu_c## with a standard deviation of ##\sigma_c## and the average weight of a candle stand is ##\mu_s## with a standard deviation of ##\sigma_s##. It seems that people buy them both. Shipping charges appear to depend on weight. Say you have to pay extra if the weight exceeds ##w##. We're then asked to compute the probability of the weight of a candle-candle stick pair exceeding ##w##. Here, if memory serves, we're to compute ##\mu_c + \mu_s## and ##\sigma_{c + s} = \sqrt {\sigma_c ^2 + \sigma_s ^2}##. It seems I've forgotten the topic (it comes right before transformations in statistics).
 
  • #15
Dale said:
Even if the population mT>mC it is possible that the sample m¯T<m¯C. Since all you observe is the sample, you cannot rule the other possibility out.
How would we safeguard against this error? It is true that even if population ##\mu_T > \mu_C##, we can get a sample ##\mu_T < \mu_C##.
 
  • #16
Agent Smith said:
How would we safeguard against this error? It is true that even if population ##\mu_T > \mu_C##, we can get a sample ##\mu_T < \mu_C##.
You have to do a much more difficult calculation. The usual calculation is to calculate ##P(data|\mu_T=\mu_C)##. This is relatively easy because the hypothesis is a single point. Instead, what would need to be calculated to safeguard against the error would be ##P(data|\mu_T>\mu_C)##. This is a much more challenging computation.
 
  • #17
@Dale , definitely not getting it here. The exercise question (please refer to the OP for basic info on the question) gives us the mean weight loss + standard deviation for the control group (##\mu_C, \sigma_C##) and the mean weight loss + standard deviation of the treatment group (##\mu_T, \sigma_T##). The numbers are such that ##\mu_T > \mu_C##. We are to test the hypothesis that diet A causes weight loss (better than the placebo?). The teach walks us through the solution, but he never mentions that it is possible that ##\mu_T < \mu_C##. I have no idea why he doesn't do that. What could be the reason he ignored this possibility; it seems important.
 
  • #18
Agent Smith said:
What could be the reason he ignored this possibility; it seems important.
It is important (in my opinion), but I cannot guess about why he ignored it. You will have to ask him about that.

Agent Smith said:
We are to test the hypothesis that diet A causes weight loss (better than the placebo?).
The usual way this is done (I assume the way that you are being taught) is to test the hypothesis that ##\mu_T=\mu_C##. This is tested by looking at the data and calculating ##P(data|\mu_T=\mu_C)##. Notice that the calculation has only the hypothesis ##\mu_T=\mu_C##, no other hypothesis is part of the calculation.

If you find that the above probability is low, then you have evidence to reject the hypothesis ##\mu_T=\mu_C##.

That is all you have evidence for. You have only made a test about the hypothesis ##\mu_T=\mu_C## so on the basis of that test you cannot make any claims about any other hypothesis. The only statement you are justified in making is "we reject the null hypothesis ##\mu_T=\mu_C##". You are specifically not justified (based on that test) in making the claim "we accept the alternate hypothesis ##\mu_T>\mu_C##“ because none of your calculations relate to that hypothesis.
 
Last edited:
  • Like
Likes WWGD
  • #19
There is an "Algebra" of Random Variable, in that the (pointwise) sum of an RV is an RV with mean, standard deviation defined as well.
 
  • #20
Thanks.
 
  • Like
Likes WWGD
  • #21
As an add on, nowadays , the " trend", if you will, is towards going beyond deciding to accept or not accept the null , but also considering effect size in some of its forms.
 
  • Like
Likes Dale

Similar threads

Replies
7
Views
401
Replies
3
Views
2K
Replies
30
Views
1K
Replies
10
Views
2K
Replies
6
Views
2K
Back
Top