Is a Two-Tailed Test Appropriate for Analyzing Plant Position and Seed Count?

tzx9633 · Dec 23, 2017

Homework Statement

In the photo above ,
H0 = the position of the plant doesn't affect the number of seeds in the pods .
H1 = position of the plant affect the number of seeds in the pods

The Attempt at a Solution

This is a two -tailed test , am i right ? Referring to the normal distribution table , z α=0.05 = 1.645 , z α/2=0.025 = 1.960 ... In the example , it's clear that the author use one -tailed test . I think it's wrong . He should use z α/2=0.025 = 1.960 , Am i right ? Correct me if i am wrong .

Homework Equations

Mark44 · Dec 23, 2017

If the author intended you to use a one-tailed test, it's not very clear from the problem description.

tzx9633 · Dec 23, 2017

Mark44 said:

If the author intended you to use a one-tailed test, it's not very clear from the problem description.

then , it's one tail or 2 tailed test ? Based on your opinion ... Because the H1 = position of the plant affect the number of seeds in the pods , so i assume it's 2 tailed test

FactChecker · Dec 23, 2017

I think you are right. The one tail test would be appropriate for these hypotheses:
H1 = Being on top increases the number of seeds
H0 = Being on top does not increase the number of seeds

The two tail test is appropriate for the stated hypotheses of the book, regarding any difference -- increase or decrease. The Z value of the book is for the 90% two-tail confidence level, (95% on each side). (see the second table, "Critical values", in http://pegasus.cc.ucf.edu/~pepe/Tables )

tzx9633 · Dec 23, 2017

FactChecker said:

I think you are right. The one tail test would be appropriate for these hypotheses:
H1 = Being on top increases the number of seeds
H0 = Being on top does not increase the number of seeds

The two tail test is appropriate for the stated hypotheses of the book, regarding any difference -- increase or decrease. The Z value of the book is for the 90% two-tail confidence level, (95% on each side). (see the second table, "Critical values", in http://pegasus.cc.ucf.edu/~pepe/Tables )

Here's my notes

• pairs of observations are independent and

– the sample size is large or small and data normal then use the t-test.

– the sample size is small and the data not normal then use the Wilcoxon rank sum (Mann-Whitney U) test.
• pairs of observations are dependent and

– the sample size is large or small and data normal then use the paired t-test.

– the sample size is small and the data not normal then use the Wilcoxon signed rank test.

In the previous example , it's dependent test ( test whether the
the position on the plant affect the number of seeds in the pods or NOT )
, am i right ? For dependent test , there are only 2 choices , right ? Which are Mann's whiteny and t-test , am i right ? Why the author use z -test in the first example ?

Ray Vickson · Dec 23, 2017

tzx9633 said:

Here's my notes

• pairs of observations are independent and

– the sample size is large or small and data normal then use the t-test.

– the sample size is small and the data not normal then use the Wilcoxon rank sum (Mann-Whitney U) test.
• pairs of observations are dependent and

– the sample size is large or small and data normal then use the paired t-test.

– the sample size is small and the data not normal then use the Wilcoxon signed rank test.

In the previous example , it's dependent test ( test whether the
the position on the plant affect the number of seeds in the pods or NOT )
, am i right ? For dependent test , there are only 2 choices , right ? Which are Mann's whiteny and t-test , am i right ? Why the author use z -test in the first example ?

There are many things wrong with this example.
(1) The data make little sense. How can the number of seeds be 5.2 in the top pod and 3.7 in the bottom pod of the exact same plant? Don't seeds come in integer numbers, 0,1,2,...? How can you have an "average number of seeds" for a single plant? I suppose there could be N1 pods on top and N2 pods on the bottom, with the "averages" being the average number per pod among the N1 top pods, etc. However, in that case, why bother with averages? Just look at the total number of seeds on top and on the bottom. It looks to me like a highly artificial problem using some arbitrary numbers, designed to give the illusion of a real problem. However, YOU are stuck with doing the example, whether it makes sense or not!

(2) How can the seed numbers in the top or bottom, or their difference, be Binomial(10, 0.5)? Here, the '10' looks like the number of plants tested, but the number of seeds in an individual plant will not depend on how many plants we choose to examine. The number of seeds per plant will be determined by the biology of the plant itself, and possibly by environmental factors, etc. Possibly, saying that the number of seeds is random with distribution ##\text{Binom}(N,p)## could be a good approximation, but there is no way to say a priori that ##N=10## and ##p = 1/2##. I guess it is possible that the person setting the problem really meant so say that "observation shows that the probability of seed numbers is approximately binomial with parameters ##N=10## and ##p = 0.5##", but if that is the case, that is the way they should have said it. Otherwise, they are likely to maximize the confusion of the student.

(3) Since the top/bottom numbers are paired (to a single plant), using a paired-sample test (such as a paired-sample t-test) might be appropriate. Certainly for a single plant the top and bottom numbers are dependent, but with careful experimental design or appropriate data-gathering, the numbers between different plants might, possibly, be independent. Even if a paired-sample t-test is not appropriate (because of non-normality of the data), it would make sense to use a non-parametric test for ##H_0: \mu=0## vs ##H_1: \mu \neq 0## for the mean ##\mu## of a sample ##X_1, X_1, \cdots, X_{10}##. Here ##X = \text{top number} - \text{bottom number}##. And, a two-sided test would be the way to go.

I looked only at your first posted image; as I said before, I won't look at posted images of solutions, only at typed work.

FactChecker · Dec 23, 2017

tzx9633 said:

In the previous example , it's dependent test ( test whether the
the position on the plant affect the number of seeds in the pods or NOT )
, am i right ? For dependent test , there are only 2 choices , right ? Which are Mann's whiteny and t-test , am i right ? Why the author use z -test in the first example ?

It would require a separate statistical test to determine if the paired results are dependent or not. If an unpaired test is used where a paired test is possible, a lot of information is lost and the unpaired test may be much weaker. So I agree that a paired test would be better. There is no reason to assume that the paired results are independent of each other. The book answer is a paired test. Each plant result is a comparison of its pair of top versus bottom and turned into a single binomial result (top greater => +, top smaller => -). The total of the binomial results is then approximated by the normal distribution. This is the usual thing to do for a large binomial sample.

tzx9633 · Dec 23, 2017

FactChecker said:

It would require a separate statistical test to determine if the paired results are dependent or not. If an unpaired test is used where a paired test is possible, a lot of information is lost and the unpaired test may be much weaker. So I agree that a paired test would be better. There is no reason to assume that the paired results are independent of each other. The book answer is a paired test. Each plant result is a comparison of its pair of top versus bottom and turned into a single binomial result (top greater => +, top smaller => -). The total of the binomial results is then approximated by the normal distribution. This is the usual thing to do for a large binomial sample.

ok , thanks for your explanation , why t-test isn't used here ? Why z-test is used ?

tzx9633 · Dec 23, 2017

FactChecker said:

It would require a separate statistical test to determine if the paired results are dependent or not. If an unpaired test is used where a paired test is possible, a lot of information is lost and the unpaired test may be much weaker. So I agree that a paired test would be better. There is no reason to assume that the paired results are independent of each other. The book answer is a paired test. Each plant result is a comparison of its pair of top versus bottom and turned into a single binomial result (top greater => +, top smaller => -). The total of the binomial results is then approximated by the normal distribution. This is the usual thing to do for a large binomial sample.

It's stated in post #5 that pairs of observations are dependent and

– the sample size is large or small and data normal then use the paired t-test.

FactChecker · Dec 23, 2017

Rigorous application of the t-test would be difficult for this problem. The paired samples t-test needs paired samples from fixed populations. In this example, each number is an average of seeds from pods. The number of pods of each plant and each top/bottom may be different and so each average may be from a different distribution. By turning each plant into one binomial sample point (top larger or top smaller), those issues disappear. Once the problem is turned into a binomial, the approximation for large n (preferably n > 20, but this is just an example problem) is the normal distribution.

tzx9633 · Dec 23, 2017

FactChecker said:

Rigorous application of the t-test would be difficult for this problem. The paired samples t-test needs paired samples from fixed populations. In this example, each number is an average of seeds from pods. The number of pods of each plant and each top/bottom may be different and so each average may be from a different distribution. By turning each plant into one binomial sample point (top larger or top smaller), those issues disappear. Once the problem is turned into a binomial, the approximation for large n (preferably n > 20, but this is just an example problem) is the normal distribution.

So , the notes is wrong , when the test are dependent , we should use normal distribution (Z test) ? And not t-test ?

FactChecker · Dec 24, 2017

tzx9633 said:

So , the notes is wrong , when the test are dependent , we should use normal distribution (Z test) ? And not t-test ?

In this case, I think so. There is more to using the t-test than just "dependent". (see https://en.wikipedia.org/wiki/Student's_t-test#Assumptions). The population distributions should also be the same normal distribution for each of the option of the pair. In other words, the population of the top pod numbers should be from one normal distribution and the population of the bottom pod numbers should be from another normal distribution. The two distributions (top and bottom) should have the same variances.

That being said, the Student's t-test is reasonably robust regarding violations of the required assumptions. So it may be acceptable to use. I have not studied violations of the assumptions. What the book did by approximating the binomial with a small sample of 10 with a normal distribution is also marginal. The recommendation is to have a sample size of at least 20 (see https://en.wikipedia.org/wiki/Binomial_distribution#Normal_approximation ).

tzx9633 · Dec 30, 2017

FactChecker said:

I think you are right. The one tail test would be appropriate for these hypotheses:
H1 = Being on top increases the number of seeds
H0 = Being on top does not increase the number of seeds

The two tail test is appropriate for the stated hypotheses of the book, regarding any difference -- increase or decrease. The Z value of the book is for the 90% two-tail confidence level, (95% on each side). (see the second table, "Critical values", in http://pegasus.cc.ucf.edu/~pepe/Tables )

Can you help me to confirm again ? I read a several online notes , the alpha used is 0.05 not 0.025(alpha /2) for 2 tailed test ...

FactChecker · Dec 30, 2017

1) The test should be a 2 tailed test because the null hypothesis you specified is "doesn't effect". That includes effects of higher or lower -- so 2 tails.
2) You specified a Z_c value of 1.645 in the OP. You can see on the "Critical Values" table in the link that the Z_c value of 1.645 is the .90 2 sided value.
3) A 2 tail test at 0.90 confidence has 0.05 on each side (tail). So the number you give is for a 90% confidence level.

If you want a 95% confidence, you need to either change the hypothesis to 1 sided "tops have more seeds than the bottoms", or change Z_c value to 1.96.

tzx9633 · Dec 30, 2017

FactChecker said:

1) The test should be a 2 tailed test because the null hypothesis you specified is "doesn't effect". That includes effects of higher or lower -- so 2 tails.
2) You specified a Z_c value of 1.645 in the OP. You can see on the "Critical Values" table in the link that the Z_c value of 1.645 is the .90 2 sided value.
3) A 2 tail test at 0.90 confidence has 0.05 on each side (tail). So the number you give is for a 90% confidence level.

If you want a 95% confidence, you need to either change the hypothesis to 1 sided "tops have more seeds than the bottoms", or change Z_c value to 1.96.

so , your conclusion is by saying at alpha = 0.05 , then the Z_c should be Z_0.025 = 1.96 , am i right ?

tzx9633 · Dec 30, 2017

FactChecker said:

1) The test should be a 2 tailed test because the null hypothesis you specified is "doesn't effect". That includes effects of higher or lower -- so 2 tails.
2) You specified a Z_c value of 1.645 in the OP. You can see on the "Critical Values" table in the link that the Z_c value of 1.645 is the .90 2 sided value.
3) A 2 tail test at 0.90 confidence has 0.05 on each side (tail). So the number you give is for a 90% confidence level.

If you want a 95% confidence, you need to either change the hypothesis to 1 sided "tops have more seeds than the bottoms", or change Z_c value to 1.96.

https://www.physicsforums.com/threads/wilcoxon-sign-rank-test-rejection-region.935726/

Can you refer to this thread pls ? The is wilcoxon signe drank test ( it's 2 tailed test) , but the author use alpha instead of 0.5alpha ... I'm wondering is the sign test same as signed rank test , which use alpha for 2 tailed test instead of 0.5alpha ??

Is a Two-Tailed Test Appropriate for Analyzing Plant Position and Seed Count?

Homework Statement

The Attempt at a Solution

Homework Equations

Attachments

Related to Is a Two-Tailed Test Appropriate for Analyzing Plant Position and Seed Count?

1. What is a sign test distribution table?

2. How is a sign test distribution table used?

3. What is the purpose of a sign test distribution table?

4. Are there different sign test distribution tables for different sample sizes or significance levels?

5. Can a sign test distribution table be used for any type of data?

Similar threads

Hot Threads

Recent Insights