Test of statistical significance

Agent Smith · Sep 21, 2024

The following appears as part of an intro to statistical significance. I don't quite get it and hence the appeal for clarification.

Someone claims that a newly invented medical test T is 99% accurate.

To check their claim the test is conducted among 100 subjects and in 95 the result is accurate.

If the test is 99% accurate then a 95% accuracy (the result we see above) has a probability = ## ^{100}C_{95} \times 0.99^{95} \times 0.01^5 \approx 0.003 = 0.3\%##

If the threshold for statistical significance is ##0.05## or ##5\%##, we have to conclude that the test T is NOT 99% accurate.

Dale · Sep 21, 2024

So what is the confusing part for you?

Agent Smith · Sep 21, 2024

How does the conclusion follow? I can't seem to put it together into a coherent whole.

What I wrote above is after making corrections to my notes, which I've not written clearly. I usually copy verbatim and I've written "at the 95% threshold", which I corrected(?) to "the 5% threshold".

Dale · Sep 21, 2024

It isn't that coherent of a process. You have a "null" hypothesis, you calculate ##P(data|hypothesis)##. If that probability is small then you consider that as strong evidence to reject the null hypothesis.

The reason that it isn't very coherent is that you are not interested in the null hypothesis and usually nobody actually believes that the null hypothesis is true. It is kind of a statistical strawman. So from a scientific standpoint you are using a rejection of the null hypothesis, not just as evidence against the null hypothesis but as evidence in support of some alternative hypothesis. But the incoherent part is that in this approach you never actually test anything directly relevant to the alternative hypothesis that you actually are interested in, and the statistical method only entitles you to reject the null hypothesis, not make any statement about the hypothesis of actual scientific interest.

Agent Smith · Sep 22, 2024

@Dale

So my null hypothesis is that the test is 99% accurate? The statistical threshold, ##\alpha## can be set to ##0.05## or ##5\%##. I find that the ##\text{P(result|99% accuracy)} \approx 0.003## and because ##0.003 < 0.05##, I reject the null hypothesis, which is that the test is 99% accurate.

Can we find out if the test is more/less accurate from the info given?

Hornbein · Sep 22, 2024

Agent Smith said:

TL;DR Summary: Statistical significance 101

The following appears as part of an intro to statistical significance. I don't quite get it and hence the appeal for clarification.

Someone claims that a newly invented medical test T is 99% accurate.

To check their claim the test is conducted among 100 subjects and in 95 the result is accurate.

If the test is 99% accurate then a 95% accuracy (the result we see above) has a probability = ## ^{100}C_{95} \times 0.99^{95} \times 0.01^5 \approx 0.003 = 0.3\%##

If the threshold for statistical significance is ##0.05## or ##5\%##, we have to conclude that the test T is NOT 99% accurate.

Agent Smith said:

@Dale

So my null hypothesis is that the test is 99% accurate? The statistical threshold, ##\alpha## can be set to ##0.05## or ##5\%##. I find that the ##\text{P(result|99% accuracy)} \approx 0.003## and because ##0.003 < 0.05##, I reject the null hypothesis, which is that the test is 99% accurate.

Can we find out if the test is more/less accurate from the info given?

The info suggests that the test is 95% accurate. The chance that it is 99% accurate given this data is .003.

Dale · Sep 22, 2024

Agent Smith said:

Can we find out if the test is more/less accurate from the info given?

One subtlety that I glossed over is the tails of the test. Technically, you don’t just compute the probability of getting the data. You compute the probability of getting the data or any data more “extreme” than the data you got.

So, in your example, you got 95 accurate tests. Let’s call that ##X=95##. What you test isn’t ##P(X=95|H_0)##, but rather we test ##P(X\le 95|H_0)##.

This is what we call a “one tailed” test. We are only testing if the data is less than the null hypothesis. In that case we can claim that it is “significantly smaller”.

If we do a two tailed test (which isn’t feasible for your example) then we should only claim that it is “significantly different”, not that it is smaller or larger. However, in practice you can always tell which tail your data is in so it is common to say “significantly larger” or “significantly smaller” even when the test actually only justifies “significantly different”.

Agent Smith · Sep 22, 2024

@Hornbein , just realized (?) that

1. I know the population proportion ##p = 0.99##
2. I know the population standard deviation ##\sigma = \sqrt {p(1 - p)} = \sqrt{0.99 \times 0.01} = \approx 0.099##

Is this correct?

Hornbein · Sep 22, 2024

Agent Smith said:

@Hornbein , just realized (?) that

1. I know the population proportion ##p = 0.99##
2. I know the population standard deviation ##\sigma = \sqrt {p(1 - p)} = \sqrt{0.99 \times 0.01} = \approx 0.099##

Is this correct?

You don't know that. The company is claiming this, you are testing whether it is true. The result is that you don't believe these numbers. You don't exactly know what the pop mean is but are pretty sure it is less than 99%.

So, why is it OK to use these wrong numbers? You are assuming they are correct, doing a test, then seeing whether this assumption pans out. In this case it doesn't.

Hornbein · Sep 22, 2024

Agent Smith said:

TL;DR Summary: Statistical significance 101

The following appears as part of an intro to statistical significance. I don't quite get it and hence the appeal for clarification.

Someone claims that a newly invented medical test T is 99% accurate.

To check their claim the test is conducted among 100 subjects and in 95 the result is accurate.

If the test is 99% accurate then a 95% accuracy (the result we see above) has a probability = ## ^{100}C_{95} \times 0.99^{95} \times 0.01^5 \approx 0.003 = 0.3\%##

If the threshold for statistical significance is ##0.05## or ##5\%##, we have to conclude that the test T is NOT 99% accurate.

Looking at it more carefully this is wrong. You calculated the probability of exactly 95 hits. That's not what you want. You want the probability of 95 hits or less assuming p=.99.

The result will however be almost the same. (That's my excuse for not noticing this.)

Hornbein · Sep 22, 2024

Agent Smith said:

@Hornbein , just realized (?) that

1. I know the population proportion ##p = 0.99##
2. I know the population standard deviation ##\sigma = \sqrt {p(1 - p)} = \sqrt{0.99 \times 0.01} = \approx 0.099##

Is this correct?

So far so good given these assumptions. If you want to make a confidence interval using the assumption of normality next calculate the standard deviation of the sample mean. I'm not sure that that assumption is a good one in this case. So, it would be a good exercise to do the test both ways and see whether they differ.

Agent Smith · Sep 22, 2024

@Hornbein I believe the teach wanted to keep it simple. I'm happy with P(95 accurate results). Also somewhere in this question I lost my way. The point it seems is to demonstrate that P(95 accurate results) given an accuracy of 99% is improbable. Part of these lessons is to use a sim to produce a distribution (assumed is that all conditions for inference have been met). If we do one for this data, what we'll find or what we're supposed to find is that a 95/100 accuracy is a tiny fraction (low probability) of the sim. Is this a good way to understand this particular problem?

Hornbein · Sep 23, 2024

Agent Smith said:

@Hornbein I believe the teach wanted to keep it simple. I'm happy with P(95 accurate results). Also somewhere in this question I lost my way. The point it seems is to demonstrate that P(95 accurate results) given an accuracy of 99% is improbable. Part of these lessons is to use a sim to produce a distribution (assumed is that all conditions for inference have been met). If we do one for this data, what we'll find or what we're supposed to find is that a 95/100 accuracy is a tiny fraction (low probability) of the sim. Is this a good way to understand this particular problem?

Yes, that's it.

Dale · Sep 23, 2024

Agent Smith said:

Is this a good way to understand this particular problem?

Yes! This is a Monte Carlo simulation. It is my personal recommendation for the best way to understand most anything in statistics.

gleem · Sep 23, 2024

Agent Smith said:

TL;DR Summary: Statistical significance 101

If the threshold for statistical significance is 0.05 or 5%, we have to conclude that the test T is NOT 99% accurate

Can You? Why would your test have more credibility?

It is claimed that the accuracy of the test is 99% So in 100 tests you expect 99±1 trues. You perform 100 tests and you get 95±2.2 trues. Your results differ from the published data by four standard deviations. Based on the usual assumptions your data suggests that it is not likely that your data is from the same parent population as the published data.

Agent Smith · Sep 23, 2024

Je ne sais pas.

Hornbein · Sep 25, 2024

gleem said:

Can You? Why would your test have more credibility?

It is claimed that the accuracy of the test is 99% So in 100 tests you expect 99±1 trues. You perform 100 tests and you get 95±2.2 trues. Your results differ from the published data by four standard deviations. Based on the usual assumptions your data suggests that it is not likely that your data is from the same parent population as the published data.

Agent Smith said:

Je ne sais pas.

It appears he's answering a different question. In this world someone has done one hundred tests each with one hundred samples. But it is hard to be sure of this.

gleem · Sep 25, 2024

When you perform a statistical experiment you get a result that either agrees with previous works to an acceptable degree or it doesn't. In the OP you did not specify an uncertainly for the published work which is important in comparing results. When your experiment doesn't agree you have to decide how to resolve this discrepancy. One thing you cannot say is that based on your experiment the published work is not correct. So what should you do?

Dale · Sep 25, 2024

gleem said:

One thing you cannot say is that based on your experiment the published work is not correct.

That is certainly something that can be said. Published works can be refuted by new studies.

gleem · Sep 25, 2024

Dale said:

That is certainly something that can be said. Published works can be refuted by new studies.

Of course, if the experiments and patient populations were equivalent. In cases like this, we are performing an experiment because we suspect the result of some other experiment is suspicious and therefore there is a bias and this the experimenter must keep in mind. The point was that the results of a study must be evaluated in more depth before you can say someone else is wrong based on new statistical results. The uncertainty of the original study was not given and should have been. What if the original experimental results included the possibility of 98% accuracy? The other problem is the new study is probably much smaller than the original so a change of just one true or false value has a significant effect on the conclusion. Medical studies often have significantly more variability than the statistics reflect.

Agent Smith · Sep 25, 2024

@gleem , I believe your objections are valid. Can you imagine a scenario where the info I provided in the question is up to the mark in all respects? If yes, we should (for my sake) work from that point on. I'm just learning the basics of statistics and the tutor knows that and perhaps omits some details in order to simplify the concepts he wishes to discuss/convey.

gleem · Sep 26, 2024

First let me say if our assumption of 99±1% was true and your test of 95 ±2.2% was also true while your result would be highly improbable to be a member of the original population the original results might be a member of your population. There is a small chance that they are consistent especially if the original study did not exclude the possibility of a 98% accuracy. Statistics provide information on which you must make a decision. Since your result differed so much from the original that should be surprising even though you think the original study was also surprising. So what to do? I would go back at look at the data to make sure that nothing might have been done that skewed the results one way or the other. one interesting thing is that the probability of 94,95,96,97 are almost equally probable! One true either way changes the interpretation dramatically. At this point, we might question whether or not we should have used a larger number of patients. The next issue and I think this is a big one in medical research is the makeup of the population in confirming studies. Two populations can be members of the same parent population but significantly different from one another if the variance is large. One final thing, if you initiated your study based on the premise that the original study was in error for some definitive reason and not just because it seemed too high then you have a reason to question its results otherwise all you can say is this is what we got.

Sometimes what looks like a slam dunk bounces out.

Now how to clean up the example to make it more real. The original study would have given the number of tests, some description of the types of patients used, its methodology, the number of true positives and negatives, and false positives and negatives. It would have provided the uncertainty of its results. This scenario is unusual since its uncertainty is bounded at the high end. Its accuracy cannot be 100% since it has some false results. By accuracy, I presume one means the percentage of verifiable true results, true positive and true negative. It is possible that for some reason they might hedge on the low side to include a reasonable chance for a 98% accuracy. They also might have given the accuracy to the nearest 0.1 % if they had enough patients say 1000. I think some caution is needed in making up things.

A nice reference for practical applications of statistics to medicine is "Introductory Medical Statistics"3rd edition, by R F Mould, Institute of Physics Publishing.

Agent Smith · Sep 26, 2024

@gleem I know you're right with regard to cryptocomplexity. However, we're considering a(n) (over)simplified scenario meant for high schoolers. I prefixed the thread with a B for basic.
@Dale 's answers are more appropriate for my level methinks. You may of course share your views, I'll keep them in mind if and when I can make it to the next level in statistics.

gleem · Sep 28, 2024

Sorry about that but if this is supposed to be a basic discussion, the example put up for discussion should be basic and not of a complex nature as drug or treatment trials. I recommend that you make up examples that you understand and could perform e.g., comparing the average grades of two sections of Physics1 taught by professor A to see if they are statistically consistent assuming professor A uses the same teaching material and teaches the classes in the same manner,

Dale · Sep 28, 2024

Introductory classes often use truly complicated examples that have been oversimplified. It is fair to state that the reality is more complicated, without necessitating that a student address all that complexity.

gleem · Sep 28, 2024

That may be true but it may give the impression that the application of statistical techniques may be simpler than it is. Perhaps that is why many/most published studies have a statistician as a co-author I don't know how many times while sitting on a hospital research committee the investigator was asked if he/she had spoken with a statistician at least to see if their project was able to meet its objective.

Agent Smith · Oct 1, 2024

Dale said:

Introductory classes often use truly complicated examples that have been oversimplified. It is fair to state that the reality is more complicated, without necessitating that a student address all that complexity.

The point to that, I believe, is to demonstrate how useful the particular math "object" is in real life. However, as I realized from the so many caveats that accompany the questions in this stats course, reality is complex and some assumptions/simplifications have to be made so that the student doesn't get bogged down by the (devil is in the) details.

Thank you.

statdad · Oct 22, 2024

Agent Smith said:

TL;DR Summary: Statistical significance 101

The following appears as part of an intro to statistical significance. I don't quite get it and hence the appeal for clarification.

Someone claims that a newly invented medical test T is 99% accurate.

To check their claim the test is conducted among 100 subjects and in 95 the result is accurate.

If the test is 99% accurate then a 95% accuracy (the result we see above) has a probability = ## ^{100}C_{95} \times 0.99^{95} \times 0.01^5 \approx 0.003 = 0.3\%##

If the threshold for statistical significance is ##0.05## or ##5\%##, we have to conclude that the test T is NOT 99% accurate.

You have (IMO) the wrong calculation. If the claimed accuracy is 95%, but you suspect

a) that it is less than 95%, based on your observed data the calculation should be to find the probability of getting AT MOST 93 correct out of 100 tries. You have the probability that the success rate is exactly 93%
b) that it is not equal to 95%, based on your observed data the calculation should be twice the probability of AT MOST 93 correct out of 100 tries (i.e., twice the answer to my point 'a')

Test of statistical significance

Similar threads

Hot Threads

Recent Insights