Proper understanding of p-value

  • I
  • Thread starter fog37
  • Start date
  • Tags
    P-value
In summary: The starting assumption is that the hypothesis is true. From there, we explore various data samples and calculate a probability value (p-value) to determine whether the hypothesis is true or not.
  • #1
fog37
1,569
108
Hello,

I am still slightly confused about the meaning of the p-value. Here my current understanding:
  • There is a population. We don't know its parameters but we want to estimate them.
    We collect a possibly large sample of size ##n## from it.
    We formulate the hypotheses ##H0## and ##H1##, set a significance level ##\alpha##, and perform a hypothesis test to either fail to reject ##H0## or reject ##H0## in favor ##H1##.
  • The p-value is the probability, ASSUMING H0 is correct, of the calculated sample statistic.
  • A low p-value leads to rejecting H0: it means that the calculated sample statistic would have been really too rare, under the assumption that H0 is correct, for it actually happen. But it happened. The sample statistic, given its low p-value probability, is to be considered rare, but it happened. This means that it cannot be ascribed to just being a random fluke. Just sampling error would have not generated such a low probability statistic value. Something deeper must be going on. This leads us to believe that H0 is not so reliable.
  • The p-value is also called "the probability of chance" because it should be the value we would expect if only chance was at work, as it happens in random sampling. The fact that the sample statistic happened regardless of its low chances, must be attributed to something other than chance.
Is this correct?

The procedure above is based on analyzing a single, large sample. What if we repeated the procedure above with another simple random sample and this time the p-value was larger than the set threshold ##\alpha##? That would mean that we would fail to reject H0...So how many samples do we need to analyze to convince ourselves that ##H0## must be rejected or not?
It seems reasonable to explore multiple random samples and determine the p-value before drawing conclusions of what to do with H0.

THANK YOU!
 
Physics news on Phys.org
  • #2
fog37 said:
I am still slightly confused about the meaning of the p-value.
Yes, it is one of the most frequently misused and misunderstood statistics. That said, your understanding seems correct:

fog37 said:
The p-value is the probability, ASSUMING H0 is correct, of the calculated sample statistic.
The usual mistake, which you are not making, it to consider the p-value as the probability that ##H_0## is correct. Or even worse, to consider it as being related to the probability of ##H_1## in any way. In simple terms it is the probability of the data, given the null hypothesis.

fog37 said:
What if we repeated the procedure above with another simple random sample and this time the p-value was larger than the set threshold α?
In this very common case you would need to perform a correction for multiple comparisons. I like the Bonferroni Holm correction.

However, even with a multiple-comparisons correction, this is one of the big problems with frequentist statistics in science. When you perform that next experiment, you actually need to alter the p-value of the original experiment. In fact, ideally when reporting the original experiment you should have considered that you would repeat the experiment and you should have adjusted the original p-value in anticipation of the follow-up experiment. By simply intending to do follow-up experiments your p-value becomes weaker, and in the limit of a conscientious experimenter who intends to continue studying a topic indefinitely, any data can be made not statistically significant.

fog37 said:
So how many samples do we need to analyze to convince ourselves that H0 must be rejected or not?
That actually is less critical than to be explicit on your stopping criteria and use that stopping criteria in calculating your p-values. Once you have your defined stopping criteria then a power analysis using that experiment can guide you about the number of samples needed.
 
  • Like
Likes fog37
  • #3
Thanks Dale!

Glad I am on the right track. So, in general, unless we want to get into more sophisticated analysis and corrections, such as the Bonferroni Holm correction that you bring up, junior statisticians stick with analyzing one single, possibly large sample from the population...

Also, given that the starting assumption that ##H0## is correct, we are considering the value claimed to be true by ##H0## at the center of a probability distribution and the p-value is the probability value from such distribution at the corresponding z value.

In regards to such distribution, is it the theoretical and Gaussian sampling distribution for the statistic (say we are concerned with the sample mean) under study? We are essentially envisioning the sampling distribution of the means with the mean value proposed by ##H0## at the center of such Gaussian sampling distribution.

Is that correct?
 
  • #4
fog37 said:
unless we want to get into more sophisticated analysis and corrections, such as the Bonferroni Holm correction that you bring up, junior statisticians stick with analyzing one single, possibly large sample from the population.
Yes

fog37 said:
Also, given that the starting assumption that H0 is correct, we are considering the value claimed to be true by H0 at the center of a probability distribution and the p-value is the probability value from such distribution at the corresponding z value.

In regards to such distribution, is it the theoretical and Gaussian sampling distribution for the statistic (say we are concerned with the sample mean) under study? We are essentially envisioning the sampling distribution of the means with the mean value proposed by H0 at the center of such Gaussian sampling distribution.

Is that correct?
Not necessarily. Your $H_0$ does not necessarily need to be at all related to the Gaussian distribution. That is a common approach , but not mandatory.
 
  • #5
fog37 said:
  • The p-value is the probability, ASSUMING H0 is correct, of the calculated sample statistic.

Assuming H0, the probability that the sample statistic takes on the value we observe is typically zero! - if we are talking about sample statistics that can take on a continuous range of values.

The p-value is, in general, the probability that the statistic lies in some interval. For example, the interval might be ##[0,\infty)##.

Justifying the use of a particular interval is a sophisticated intellectual exercise. For example, it's easy to explain the customary scenarios for using "one-tailed" vs "two-tailed" tests and the procedures are intuitively pleasing, but how do we prove that the methods are correct in any sense? The key to that is to define "correct" rigorously. This has to do with defining the "power" of statistical tests.

After all, since the p-value is, in general, the probability of an event that includes outcomes where the observed statistic did not have the value we observe, how do we justify including the probability of things that did not happen in making a decision?
 
  • Like
Likes fog37 and Dale
  • #6
Since we are discussing statistics, my understanding is that statistical inference can be used to for 3 purposes:

a) find a yes/no answer about a parameter of an unknown population (that is hypothesis testing)
b) estimate the parameter(s) of an unknown population with a certain level of confidence (that is estimation)
c) predict the future (that is forecasting)

I am not sure about c)...How is inferential statistics used to predict the future? Are we assuming that the populations parameters can vary in the future and the idea is to predict them? Are talking about statistics in the context of time series, regression models as models to predict data that we currently don't have?

Also, I have been reading about probability vs statistics and some simplistically define them the inverse of each other...Every intro statistics book has a probability section. Is it because statistics employs the tools of probability to do statistical analysis? I guess...

Thanks
 

FAQ: Proper understanding of p-value

What is a p-value?

A p-value is a statistical measure that helps determine the likelihood of obtaining a certain result by chance. It is used to assess the strength of evidence in support of a hypothesis, with a lower p-value indicating stronger evidence.

How is a p-value calculated?

A p-value is calculated by comparing the observed data to the null hypothesis. This is typically done using a statistical test, such as a t-test or ANOVA, which generates a test statistic. The p-value is then determined by the probability of obtaining a test statistic as extreme or more extreme than the one observed, assuming the null hypothesis is true.

What is a "significant" p-value?

A significant p-value is typically considered to be less than 0.05. This means that there is a less than 5% chance of obtaining the observed result by chance if the null hypothesis is true. However, the significance level can vary depending on the field of study and the specific research question.

Can a p-value determine the truth of a hypothesis?

No, a p-value cannot determine the truth of a hypothesis. It can only provide evidence for or against a hypothesis. Other factors, such as study design, sample size, and effect size, also play a role in determining the validity of a hypothesis.

How should p-values be interpreted?

P-values should be interpreted in the context of the research question and study design. A significant p-value does not necessarily mean that the observed result is important or meaningful. It is important to consider other factors, such as effect size and confidence intervals, when interpreting the results of a study.

Similar threads

Replies
5
Views
2K
Replies
3
Views
2K
Replies
20
Views
397
Replies
7
Views
2K
Replies
13
Views
2K
Replies
5
Views
3K
Replies
6
Views
2K
Replies
4
Views
2K
Back
Top