Does the statistical weight of data depend on the generating process?

In summary, if we have two identical data sets that were generated by different processes, the statistical weight of their evidence for or against a hypothesis may be different. This can be a problem in fields like psychology, where the same data may be analyzed in different ways depending on the initial hypothesis. Some suggest that Bayesian methods, where only the data is considered and the intended experiment doesn't matter, may be a better approach. However, this approach may not take into account the differences in experiments and their constraints, leading to potentially flawed conclusions.
  • #36
PeterDonis said:
Yes, but that's not the question I asked. The question I asked was whether ##\lambda = 0.5## is less likely given the second case vs. the first.
How does "the second data set is less likely given the hypothesis that ##\lambda = 0.5##" get transformed to
"the hypothesis that ##\lambda = 0.5## is less likely given the second data set"? That is not a valid deductive syllogism; in fact it's a common error people make (assuming that if A then B is equivalent to if B then A).

I'm working to standard hypothesis testing. In particular, there is a single, unknown value ##\lambda##. It's not a random variable.

We can test ##\lambda = 0.5## (or any other value) against a random data set ##X## and compute ##p(X|\lambda)## for that data set.

The data in case #2 is less likely, given the hypothesis ##\lambda = 0.5##.

Eventually, with enough data, we would have to abandon the hypothesis ##\lambda = 0.5##. That is a thornier issue. In reality, it is more about an accumulation of data than one test.

Here the data in case #2 gives us less confidence in our hypothesis. That is the sense in which ##\lambda = 0.5## is "less likely".
 
Physics news on Phys.org
  • #37
PeroK said:
Here the data in case #2 gives us less confidence in our hypothesis.

Why? As I've already said, there is no valid deductive reasoning that gets you from "the second data set is less likely given the hypothesis that ##\lambda = 0.5##" to "the hypothesis that ##\lambda = 0.5## is less likely given the second data set". So since you can't be using valid deductive reasoning, what reasoning are you using?

PeroK said:
I'm working to standard hypothesis testing.

I'm not sure that standard hypothesis testing (aka frequentist statistics) has a good answer to the question I just posed above. But if there is one, I would like to know it.
 
  • #38
PeterDonis said:
See the question I asked @PeroK in the last part of post #30.
If this is the difference between the two, then the Bayesian model doesn't make much sense to me for real life situations. You cannot setup different experiments such that the outcome only depends on the random variable.
 
  • #39
fresh_42 said:
You cannot setup different experiments such that the outcome only depends on the random variable.

I don't see how this is relevant. The two cases don't differ in their outcomes; the outcomes are the same. They only differ in the process used to generate the outcomes, and that process, itself, does not depend on the variable (p, or ##\lambda## in @Dale's notation) whose value we are trying to estimate.
 
  • #40
PeterDonis said:
I already specified what the two samples are: the (identical) data from couples #1 and #2. So are you saying that, if the only difference between the couples is the process they used, the two data sets have the same statistical weight when estimating ##p##?
I don't see how we can estimate anything from two tests. With sample size I meant enough tests of either setups. If we measure an effect a million times at CERN and a thousand times at Fermi, and have the same results, why should there be a different significance? The million tops the thousands, but given the identical outcome, I don't see a different weight.
 
  • #41
fresh_42 said:
given the identical outcome, I don't see a different weight.

Ok.
 
  • #42
PeterDonis said:
I don't see how this is relevant.
I think there is a major difference between theory and real life. Given the same outcome, we cannot decide which experiment is closer to the real distribution. The quality of the processes cannot be distinguished. I just say that there are always unkowns which don't find their way into the calculation. Such as the father's age in the first example.
 
  • #43
fresh_42 said:
Given the same outcome, we cannot decide which experiment is closer to the real distribution.

Again, I'm confused by this, because the two different "experiments" (the different processes the couples are using) have nothing to do with the distribution. They have nothing to do with what the value of ##\lambda## is. So asking "which experiment is closer to the real distribution" seems like nonsense to me.
 
  • #44
PeterDonis said:
I'm not sure that standard hypothesis testing (aka frequentist statistics) has a good answer to the question I just posed above. But if there is one, I would like to know it.

I wouldn't discount it quite so readily. Let's follow your line of logic through. Suppose you did a large survey of births in the USA in the last year. You want to measure the probability that a boy is born, as opposed to a girl. Call this ##\lambda##. What you cannot do is give a probability distribution for ##\lambda##. Something like:

##p(\lambda = 0.47) = 0.05##
##p(\lambda = 0.48) = 0.10##
##p(\lambda = 0.49) = 0.20##
##p(\lambda = 0.50) = 0.30##
##p(\lambda = 0.51) = 0.20##
##p(\lambda = 0.52) = 0.10##
##p(\lambda = 0.53) = 0.05##

That is not valid because ##\lambda## was not a random variable in the data you analysed.

Instead, you can say some thing like:

##\lambda## is in the range ##0.47 - 0.52## with ##99\%## confidence.
##\lambda## is in the range ##0.48 - 0.51## with ##90\%## confidence.
##\lambda## is in the range ##0.49 - 0.50## with ##80\%## confidence.

That's the difference between "confidence" and "probabilities". Parameters associated with a distribution have confidence levels, not probabilities. The random data has probabilities.
 
Last edited:
  • Like
Likes Jimster41
  • #45
with a single sample in either trial the ex post odds are the same - one success in seven trials. continuing with the coin flipping analogy, if you had additional samples, the distribution would differ - one sample set would be of the number of heads in seven coin flips and the other the number of flips before the first head appeared.

the boy/girl example is confusing because it’s not clear whether the problem assumes an equal p=boy between the two couples, which biologically would not be true, or is attempting to measure p=boy for each couple separately, which, while biologically realistic, precludes any additional information from further samples, or to use the two couples to estimate the p=boy for the overall population, in which case one can simply disregard the two couples as outliers
 
  • #46
PeroK said:
What you cannot do is give a probability distribution for ##\lambda##. Something like:

##p(\lambda = 0.47) = 0.05##
##p(\lambda = 0.48) = 0.10##
##p(\lambda = 0.49) = 0.20##
##p(\lambda = 0.50) = 0.30##
##p(\lambda = 0.51) = 0.20##
##p(\lambda = 0.52) = 0.10##
##p(\lambda = 0.53) = 0.10##

That is not valid because ##\lambda## was not a random variable in the data you analysed.
That is exactly what Bayesian statistics do. They do treat ##\lambda## as a random variable and determine its probability distribution.
 
  • #47
BWV said:
t’s not clear whether the problem assumes an equal p=boy between the two couples

In my discussion with @fresh_42 I clarified that I intended to include this assumption, yes. I agree, as I said in that discussion, that the assumption is an idealization.

We could go into how one would analyze the data if that assumption were dropped, but that's a further complication that I don't really want to get into in this thread.
 
  • #48
PeterDonis said:
Again, I'm confused by this, because the two different "experiments" (the different processes the couples are using) have nothing to do with the distribution. They have nothing to do with what the value of ##\lambda## is. So asking "which experiment is closer to the real distribution" seems like nonsense to me.
I believe that each real life test has different random variables and different conditional probabilities and thus different distributions. The assumption that they are the same is already a hypothesis. One I would work with as long as the outcomes remain stable. This adds up to the confidence into the hypothesis. If you mean confidence by statistical weight, then the number of tests and the setup does play a role.
 
  • #49
Dale said:
That is exactly what Bayesian statistics do. They do treat ##\lambda## as a random variable and determine its probability distribution.

This might be a matter of differing terminology. In Jaynes' Probability Theory, for example, he describes processes like estimating a distribution for ##\lambda## as "parameter estimation". (He doesn't appear to like the term "random variable" much at all, and discusses some of the confusions that using it can cause.)
 
  • #50
Dale said:
That is exactly what Bayesian statistics do. They do treat ##\lambda## as a random variable and determine its probability distribution.

What does a Bayesian analysis give numerically for the data in post #1?
 
  • #51
PeterDonis said:
How might the prior for couple #2 be different from the prior for couple #1?
If you had previous studies that showed, for example, that couples who decided on a fixed number of children in advance had different ##\lambda## than other couples.
 
  • #52
PeroK said:
I wouldn't discount it quite so readily. Let's follow your line of logic through. Suppose you did a large survey of births in the USA in the last year. You want to measure the probability that a boy is born, as opposed to a girl. Call this ##\lambda##. What you cannot do is give a probability distribution for ##\lambda##...
this appears to be falling victim to the Inspection Paradox. Whether you sample based on children or parents matters. Original post discussed sampling by Parents (I think) and you are now sampling by children.

- - - -
I wish Peter would restate the question in a clean probabilistic manner. Being a Frequentist or Bayesian has little do with the essence of the problem. The original post is really about stopping rules, something pioneered by Wald (who, yes did some bayesian stats too). And yes subsequent to Wald, stopping rules were extended in a big way by Doob via Martingales.
 
  • Like
Likes Jimster41
  • #53
I vaguely remember similar discussions at my institute. I like Hendrik's approach in QFT: sit down and calculate. Interpretations are another game.
 
  • #54
StoneTemplePython said:
this appears to be falling victim to the Inspection Paradox. Whether you sample based on children or parents matters. Original post discussed sampling by Parents (I think) and you are now sampling by children.

Are you talking about the case where some parents have a genetic disposition to one sex for their children?

I was assuming the idealised case where we have a single probability in all cases.
 
  • #55
fresh_42 said:
Seems a bit linguistic to me.
In general the difference between ##p(X|\lambda)## and ##p(\lambda|X)## is not merely linguistic. They are different numbers. In addition there is the difference in the space over which the probabilities are measured. One is a measure over the space of all possible experimental outcomes ##X## and the other is a measure over the space of all possible boy-birth probabilities ##\lambda##
 
  • #56
StoneTemplePython said:
this appears to be falling victim to the Inspection Paradox. Whether you sample based on children or parents matters. Original post discussed sampling by Parents (I think) and you are now sampling by children.

PS in any case, I was only describing the difference between probability and confidence; not trying to analys the initial problem. See post #6.
 
  • #58
PeroK said:
Are you talking about the case where some parents have a genetic disposition to one sex for their children?

I was assuming the idealised case where we have a single probability in all cases.

My read on original post was a question with two 'types' (or iid representatives for classes) of families. One having n kids (stopping rule: n, so random variable = n, with probability one for our purposes) and the other has a geometrically distributed random variable for number of kids (stopping rule: when a girl is born).

The underlying idea of how you sample is closely related to what Dale is saying -- but the way people get tripped up... happens so often it goes under the name of "Inspection Paradox" (originally a renewal theory idea, but pretty general)... we need to be very careful on whether we are doing our estimates by sampling kids or sampling the parents/couples
 
  • #59
StoneTemplePython said:
My read on original post was a question with two 'types' of families. One having n kids (stopping rule: n, so random variable = n, with probability one for our purposes) and the other has a geometrically distributed random variable for number of kids (stopping rule: when a girl is born).

The underlying idea of how you sample is closely related to what Dale is saying -- but the way people get tripped up... happens so often it goes under the name of "Inspection Paradox" (originally a renewal theory idea, but pretty general)... we need to be very careful on whether we are doing our estimates by sampling kids or sampling the parents/couples

What's your opinion on post #6? I know you're the real expert on this!
 
  • #60
PeroK said:
What's your opinion on post #6? I know you're the real expert on this!
I worry that you think I was criticizing your calculation in #6. I am not. It seems to me like a valid calculation, it is just a calculation of a different probability than what you would calculate with Bayesian methods. Nothing wrong with that, just different.
 
  • #61
PeroK said:
PS in any case, I was only describing the difference between probability and confidence; not trying to analys the initial problem. See post #6.
ah ok. got it. I missed this.
PeroK said:
What's your opinion on post #6? I know you're the real expert on this!
I'm try to avoid the statistical estimation stuff right now... too perilous.

What I'd like to do with respect to original post is flush out the problem, apply a sufficient condition so we can use the Optional Stopping Theorem, and be done with it. But depending on what exactly is being asked, stopping rules either don't matter, or they matter a lot. (And if you have a defective stopping rule you can get into a lot of trouble without realizing it.)
 
  • #62
StoneTemplePython said:
I wish Peter would restate the question in a clean probabilistic manner. Being a Frequentist or Bayesian has little do with the essence of the problem. The original post is really about stopping rules,

Yes, it is. One way of rephrasing the question is whether and under what circumstances changing the stopping rule makes a difference. In particular, in the case under discussion we have two identical data sets that were collected under different stopping rules; the question is whether the different stopping rules should affect how we estimate the probability of having a boy given the data.
 
  • #63
Dale said:
It seems to me like a valid calculation, it is just a calculation of a different probability than what you would calculate with Bayesian methods.

Yes, so another way of stating the question I asked in the OP is, which of these different probabilities is the one that is relevant for estimating ##\lambda## given the data? You seem to be saying it's yours, but @PeroK seems to be saying it's his. You can't both be right.
 
  • #64
StoneTemplePython said:
What I'd like to do with respect to original post is flush out the problem, apply a sufficient condition so we can use the Optional Stopping Theorem, and be done with it. But depending on what exactly is being asked, stopping rules either don't matter, or they matter a lot.

Can you give examples of each of the two possibilities you describe? I.e, can you give an example of a question, arising from the scenario described in the OP, for which stopping rules don't matter? And can you give an example of a question for which they matter a lot?
 
  • #65
StoneTemplePython said:
stopping rule: when a girl is born

This is not the correct stopping rule for couple #2. The correct stopping rule is "when there is at least one child of each gender". It just so happens that they had a boy first, so they went on until they had a girl. But if they had had a girl first, they would have gone on until they had a boy.
 
  • #66
PeterDonis said:
This might be a matter of differing terminology. In Jaynes' Probability Theory, for example, he describes processes like estimating a distribution for ##\lambda## as "parameter estimation". (He doesn't appear to like the term "random variable" much at all, and discusses some of the confusions that using it can cause.)
Yes, some authors are not clear on this point. But since it has a probability density function it is in fact what is commonly called a “random variable.”
 
  • #67
PeroK said:
I was assuming the idealised case where we have a single probability in all cases.

That's the case I would like to discuss in this thread. Other possibilities introduce further complications that I don't want to get into here.
 
  • #68
PeterDonis said:
Yes, so another way of stating the question I asked in the OP is, which of these different probabilities is the one that is relevant for estimating ##\lambda## given the data? You seem to be saying it's yours, but @PeroK seems to be saying it's his. You can't both be right.
How can you get an estimate of ##\lambda## by calculating ##p(X|\lambda=0.5)## at all? Even frequentist statistics don’t estimate ##\lambda## that way.
 
  • #69
Dale said:
How can you get an estimate of ##\lambda## by calculating ##p(X|\lambda=0.5)## at all? Even frequentist statistics don’t estimate ##\lambda## that way.

We're not estimating ##\lambda##, we're testing a hypothesis. If all the data we've ever seen is, say, ##BBBBBBG##, then no there is no way to "estimate" ##B## and ##G## as equally likely.
 
  • #70
Dale said:
Even frequentist statistics don’t estimate ##\lambda## that way.

@PeroK is saying that the second data set should make us less confident in the hypothesis that ##\lambda = 0.5## than the first data set, based on the p-value being lower. So frequentist statistics certainly seem to believe that ##p(X|\lambda = 0.5)## has some relevance.

"Estimating ##\lambda##" might not be the right way to express what I'm asking. Bayesian arguments such as you have made would seem to say that our confidence in the hypothesis that ##\lambda = 0.5## should be the same for both data sets, since the posterior distribution on ##\lambda## is the same. (More precisely, it's the same as long as the prior in both cases is the same. You gave an example of how the priors could be different; I'll respond to that in a separate post. For now, I'm focusing on the case where the priors are the same, since the p-values are still different for that case.) If that is the case, then the frequentist claim @PeroK is making is wrong.

OTOH, if the frequentist claim @PeroK is making is right, then there ought to be some way of reflecting the difference in the Bayesian calculation as well. But I can't come up with one.
 

Similar threads

Back
Top