Why not choose a confidence level retrospectively?

  • Thread starter Rasalhague
  • Start date
In summary, Sanders warns against manipulating data by selecting a confidence interval that appears most favorable after calculating multiple intervals with varying confidence levels. This approach introduces bias and should be avoided. Additionally, he distinguishes between the terms "confidence interval" and "interval estimation", with the former being a specific type of the latter. He also cautions against the misinterpretation of confidence intervals and emphasizes the importance of setting the confidence level before any sampling takes place.
  • #1
Rasalhague
1,387
2
One caution should be mentioned here. The confidence interval should be stated before the interval estimation. Sometime a novice researcher calculates a number of interval estimates on the basis of a single sample while varying the confidence level. After obtaining these estimates, he or she then selects the one that seems most suitable. Such an approach is really manipulating the data so that the results of a sample are the way a researcher would like to see them. This approach introduces a researcher's bias into the study, and it should be avoided.

- Sanders: Statistics: A First Course, 5th ed., § 7.2, p. 236.

Why? The data has not been changed. What difference does it make whether the decision is made before or after? If it's legitimate to calculate the distance from the statistic for which there's a 90% confidence level that the parameter lies, why not 95% or 99%, or all three?
 
Physics news on Phys.org
  • #2
Suppose you are measuring the width of some engineered part. You already know that the part's width needs to be within a 1 mm interval around the target width in order to fit well. Suppose your 95% confidence interval for the width is 1.1 mm wide. You might be tempted to drop to a 90% confidence interval just so you can say the part is within tolerances. If you did, you'd be "framing" the data to deliberately make it appear more favorable than your initial analysis showed. It's similar to cherry-picking.
 
  • #3
Just observing that a result falls within the 90% confidence interval needn't make us forget that it doesn't fall within the 95%. It seems like there'd only be a problem if you ignored the extra piece of information: "that the part's width needs to be within a 1 mm interval around the target width in order to fit well."

Suppose it fell within the 99% confidence interval. What harm would it do to simply notice this? If you were testing a hypothesis, why would you only report that the alternative was confirmed with 95% if you could, legitimately, make the stronger claim that it was confirmed with 99% confidence?
 
  • #4
How does the book distinguish between "the confidence interval" and "the interval estimation"? The current Wikipedia article on "interval estimation" says that a "confidence interval" is a particular type of "interval estimation". Sanders appears to use a different terminology.
 
  • #5
An interval estimate is a spread of values used to estimate a population parameter, and the process of estimating with a spread of values is known as interval estimation.


He distinguishes this from a point estimate, "a single number used to estimate a population parameter."

Confidence intervals are those interval estimates based on specified confidence levels, [...]

These definitions agree with the idea that a confidence interval is a type of interval estimate. But in the passage I quoted in #1, I've been assuming that when he says "a number of interval estimates", he means specifically "a number of confidence intervals" (based on different confidence levels).
 
  • #6
The formal definition of a confidence interval (at a given confidence level and for a given distribution) defines an interval with a specific length, but it does not have numerical endpoints. For example, a confidence interval for the mean might span "the true mean plus or minus 2.0". In that situation, a person might take sample and if the sample mean were 5.0, he might declare that (3.0, 7.0) is also a "confidence interval".

Likewise, I suppose some people may apply the term "interval estimate" both to intervals stated about the "true" parameter and also to numerical intervals about a sample value. Where does Sanders stand on this terminology ?

I agree that It isn't clear what the passage you quoted is claiming. One interpretation might be this: If a person is willing to call an interval about the sample mean a "confidence interval", then he might take the sample, get a sample mean of 5.0 and ask himself, what size interval will impress the public with the precision of my sampling? What about (0.0 to 10.0)? No, that's too big. They want to see something like (3.0, 7.0). What confidence level do I need to claim that?

This contradicts the idealistic scenario for setting confidence intervals (about the unknown parameter). One is supposed is have a given "confidence level" in mind and set it before any sampling.

Setting the confidence level based on a sample preys on the public's tendency to misinterpret numerical intervals. If the average person hears that "(3.0, 7.0) is a 90% confidence interval for the mean", he thinks that this implies that there is a 90% chance that the true mean is in that particular interval. But , to me, it's not much more predatory than making a similar claim when the confidence level is set in advance.
 
  • #7
Stephen Tashi said:
Setting the confidence level based on a sample preys on the public's tendency to misinterpret numerical intervals. If the average person hears that "(3.0, 7.0) is a 90% confidence interval for the mean", he thinks that this implies that there is a 90% chance that the true mean is in that particular interval. But , to me, it's not much more predatory than making a similar claim when the confidence level is set in advance.

It seems this "misinterpretation" is how Sanders actually defines it:

The level of confidence [...] refers to the probability of correctly including the population parameter being estimated in the interval that is produced.

How would you define the purpose of a confidence interval?
 
  • #8
In that statement, Sanders did not make an overt misinterpretation of a confidence interval. Assume we have a given probability distribution and are sampling from it. There is a distinction between the following statements:

Statement 1: "If I take 100 samples, there is a 90% probability that the sample mean that I observe will be within plus or minus 2 of the true population mean."

Statement 2: "I took 100 samples and the sample mean was 5.0. So there is a 90% probability that that 5.0 is within plus or minus 2 of the population mean."

Statement 2 does not follow from statement 1. In his definition of confidence interval, Sanders is saying something like statement 1, which does not give the public any specific interval to misinterpret.

I'm speculating that in passage where he warns of "bias", he is referring to specific intervals like those in statement 2 since he talks about a sample having been taken

I don't know whether Sanders calls both the intervals mentioned in the two statements "confidence intervals". My old statistics text ( Mood, Graybill and Boes) defines a confidence interval as the type of interval mentioned in statement 1 and says intervals like the one mentioned in statement 2 are called confidence intervals by "abuse of language".

If we were using Bayesian statistics (my preference) then we would assume some prior distribution for the population mean and it would be possible to state a posterior distribution for it after the sample was taken. This would allow statements like statement 2 to be deduced. However, in "frequentist" statistics the assumption is that the population mean has a "definite but unknown value". It does not have a prior probability distribution - except for a trivial prior which has all the probability concentrated at a single, unknown point.
 
  • #9
...and, according to frequentist theory, the purpose of confidence intervals (like those mentioned in statement 1, which do not involve specific numerical endpoints ) is to quantify the reliability of the sampling plan (a very noble sounding purpose!).
 
  • #10
Stephen Tashi said:
There is a distinction between the following statements:

Statement 1: "If I take 100 samples, there is a 90% probability that the sample mean that I observe will be within plus or minus 2 of the true population mean."

Statement 2: "I took 100 samples and the sample mean was 5.0. So there is a 90% probability that that 5.0 is within plus or minus 2 of the population mean."

Hmmmmmm, I'm not seeing it yet. Isn't the probability of "the yet-to-be-observed sample mean being a certain distance from a known population mean" the same as the probability of "an unknown population population mean being that distance from an observed sample mean"? (The distance from me to you is the same as the distance from you to me.) The impression I got from Koosis: Statistics: A Self-Teaching Guide, which I read most of before coming to Sanders, was that the principle is the same in either case. Of course, there's a 90% probability I misunderstood...

What's the significance of "if I take 100 samples" at the beginning of these statements? Wouldn't the probability of one particular individual arbitrary sample having a certain property be the same regardless of how many samples you happened to take and ignore at the same time?
 
  • #11
Rasalhague said:
Hmmmmmm, I'm not seeing it yet. Isn't the probability of "the yet-to-be-observed sample mean being a certain distance from a known population mean" the same as the probability of "an unknown population population mean being that distance from an observed sample mean"? (The distance from me to you is the same as the distance from you to me.)

Yes, it is the same, so (for a given probability distribution) these two statements are equivalent:

Statement 1A: "If I take 100 samples, there is a 90% probability that the sample mean that I observe will be within plus or minus 2 of the true population mean."

Statement 1B: "If I take 100 samples, there is a 90% probability that the population mean will be within plus or minus 2 of the sample mean that I observe."

And these two statements are equivalent:

Statement 2A: "I took 100 samples and the sample mean was 5.0. So there is a 90% probability that that 5.0 is within plus or minus 2 of the population mean."

Statement 2B: "I took 100 samples and the sample mean was 5.0. So there is a 90% probability that that the population mean is within plus or minus 2 of 5.0."

But statements 1A and 1B do not imply statements 2A and 2B.

What's the significance of "if I take 100 samples" at the beginning of these statements? Wouldn't the probability of one particular individual arbitrary sample having a certain property be the same regardless of how many samples you happened to take and ignore at the same time?

Confidence intervals are used for making statements about estimators and estimators are not usually the same as single samples. To estimate the population mean, the usual estimator is the sample mean. The distribution of this estimator is a function of the number of samples. The more samples you take, the smaller the variance of the estimator (which is itself a random variable).

I just picked 100 as an example. I'm assuming the standard deviation of the distribution in question and the 100 samples work out so that plus or minus 2 gives you a 90% confidence interval if you assume the sample mean is normally distributed and compute its standard deviation based on the standard deviation of the distribution for a single sample and the fact that 100 samples will be taken.
 
  • #12
Still confused, I'm afraid. How would you respond to the argument that the ones do imply the twos, by universal instantiation: what's true of every object of a class (values that a sample mean could take) must be true of a particular object of that class (the value 5).

Stephen Tashi said:
The distribution of this estimator is a function of the number of samples. The more samples you take, the smaller the variance of the estimator (which is itself a random variable).

Could you state explicitly what the domain and codomain of this random variable are? I'm finding it really hard to connect these theoretical concepts about probability with the way the terms are used in practice in elementary statistics books, once the theoretical chapter is done with.

Even better, if you could list the probability spaces you're talking about, stating explicitly what sample spaces, probability measures, etc. are associated with each, and what their conventional names are.

The expression "distribution of the sample mean" makes me think of what Sanders calls the sampling distribution of sample means, but I get the impression you're talking about something else which just happens to have a similar name.
 
Last edited:
  • #13
Rasalhague said:
Still confused, I'm afraid. How would you respond to the argument that the ones do imply the twos, by universal instantiation: what's true of every object of a class (values that a sample mean could take) must be true of a particular object of that class (the value 5).
Consider this statement:

Statement 3: "for each sample mean S, there is a 90% probability that the population mean is within plus or minus 2 of S".

That is the statement that you would need in order to conclude statement 2 by "universal instantiation".

But statement 3 is obviously false for a probability distribution with sufficient variance to it, There can be some sample means that are very far away from the population mean.

Statement 1 and statement 3 are not equivalent statements.
Could you state explicitly what the domain and codomain of this random variable are?

I'll assume the random variable in question is the sample mean.

Let X be a random variable. Let S be the random variable that is the sample mean of 100 independent realizations of X.

One way to define the domain of S would be to say that is consists of vectors. Each vector has 100 numbers in it. The possible value of X would be used to define what values the numbers can take.

Since the order the samples are taken in is not important, we might think about defining the domain of S in terms of some unordered set of numbers, but defining an element of the domain merely as a set of numbers won't do since the sample may have repeated values and there is no way to reflect that in set of numbers. (e.g. as sets {1,2} = {1,2,2} ). So I think its simplest to define the domain of S as a set of vectors. ( It wouldn't surprise me if different books have different ways of defining the domain of S.)

The codomain of S is the set of whatever numbers you can get by averaging 100 realizations of X. For example if X is a uniformly distributed random variable on the interval 0.0 to 1.0 then the possible values of S would be the numbers in that interval.

The way that the sample mean S fits into the scheme of confidence intervals is that the sample mean is a particular estimator of a particular parameter (the mean mu) of the distribution of X.

There can be other estimators of the same parameter. For example, given 100 samples X1, X2,...X100, one could also estimate the mean by W = 1/2 ( min{X1,...X100} + max{X1,...X100} ). Presumably, since people usually estimate mu by using the sample mean, the sample mean (as a random variable) must have some properties that make it a more desirable estimator than W.
 
  • #14
Ah, I see the source of my confusion now. Sanders and Koosis would conceptualise what you call "100 samples" as "one sample, of size 100" or "one sample containing 100 items". That's what was throwing me there.

Thanks for the rest of the explanation. I've had a few goes at figuring this out, and some of them looked a bit like this. I haven't had a chance to properly think about it and try to apply it to other, similar concepts - but I intend to as soon as I get time. Thanks again: from quickly reading it, I'm sure it'll be very useful.
 
  • #15
Can additional information make a probability undefined?

I can't resist offering this little simplification of the issues. (I wonder if it will be controversial.)

Can additional information in a problem make a probability undefined?

Yes!

Suppose we have 10 identical boxes. 2 Are empty and 8 contain prizes. The numbers 1 through 10 are written on the outside of boxes. I define two slightly different cases.

Case 1. The numbers are randomly assigned to the boxes.
Case 2. Someone assigns numbers to the boxes, but we don't know that they are randomly assigned.

We are to solve the following problems for each of the two cases.

Problem A: Suppose a person picks a box at random, what is the probability that the box contains a prize?

Problem B: Suppose a person picks a box at random and number on the box is 3. What is the probability that the box contains a prize?

In problem A, the answer to both cases is 8/10.

In problem B case 1, the probability is 8/10. We can empircally verify this by running a simulation or by the usual type of combinatorial calculations

For problem B case 2, there is no mathematical answer. One might attempt to justify an answer of 8/10 by the following reasoning:

The probability that a randomly selected box will contain a prize is 8/10. The box marked '3' was randomly selected. Therefore since the probability 8/10 applies to each box, it applies to box '3'.

However, the statement "The probability that a randomly selected box will contain a prize is 8/10" is not the same as the assertion that "For each box number x, when x is the randomly selected box, the probability it contains a prize is 8/10". If the two statements are equivalent, then they should both be true no matter how the box numbers were assigned. So they should both be true in the case when the person who assigned the numbers gave the lower numbers to the boxes with prizes.

We can't demonstrate by a simulation that 8/10 is the answer to problem B case 2, because we don't know how to simulate assigning the box numbers.

A "casual" Bayesian would get an answer to Problem B case 2 by assuming that all assignments of numbers to boxes are equally likely. A more studious Bayesian would do experiments to see how a random selection of people tend to number the boxes. He would use the data to hypothesize a distribution for the number assignments. Both these approaches involve assuming a prior distribution for the numberings that is not given in the problem.
 

FAQ: Why not choose a confidence level retrospectively?

1. Why is choosing a confidence level retrospectively not recommended?

Choosing a confidence level retrospectively can lead to biased results and a higher risk of making a Type I error (rejecting a true null hypothesis). This is because the confidence level is meant to be determined before the study begins, based on the researcher's goals and the potential consequences of making a Type I error.

2. What are the potential consequences of choosing a confidence level retrospectively?

Choosing a confidence level retrospectively can lead to a higher chance of false positives and a lower chance of discovering true effects. This can ultimately undermine the validity and reliability of the study's results.

3. Can a confidence level be changed after data collection?

Technically, a confidence level can be changed after data collection, but this is not recommended. Changing the confidence level after data collection can introduce bias and make it difficult to interpret the results. It is best to determine the confidence level before data collection and stick to it.

4. Is it ever acceptable to choose a confidence level retrospectively?

In some cases, it may be acceptable to choose a confidence level retrospectively, but only if it is clearly stated in the study's design and the rationale for doing so is well-justified. However, this should be avoided whenever possible to maintain the integrity of the study's results.

5. How should a confidence level be chosen?

A confidence level should be chosen based on the researcher's goals and the potential consequences of making a Type I error. Generally, a confidence level of 95% is commonly used in scientific research, but it can vary depending on the specific field or study design. It is important to carefully consider the appropriate confidence level before beginning a study.

Back
Top