Does the statistical weight of data depend on the generating process?

Dale · Dec 6, 2019

PeroK said:

We're not estimating ##\lambda##,

Why not? Since that is the specific question of interest that is exactly what we should do.

PeterDonis · Dec 6, 2019

Dale said:

If you had previous studies that showed, for example, that couples who decided on a fixed number of children in advance had different ##\lambda## than other couples.

For this case, I'm not sure exactly what frequentists would say. They might say that you would need to test the two cases against different hypotheses, so you can't really compare them at all.

I think this gets into complications that I said I didn't want to get into in this thread. As I noted in post #70, the case where the priors are the same still has different p-values for the two data sets, so it's enough to bring out the difference between the frequentist and Bayesian approaches.

Dale · Dec 6, 2019

PeterDonis said:

I think this gets into complications that I said I didn't want to get into in this thread.

I agree. I certainly would assume equal priors, but in principle they could be unequal.

PeroK · Dec 6, 2019

Dale said:

Why not? Since that is the specific question of interest that is exactly what we should do.

If you gave me some data that read ##XXXXXXY## and you asked me to estimate the probability of getting ##X## or ##Y##, then (if forced to give an answer) I would say ##6/7## for ##X##.

But, that is not the case here. The question is about children being born, where we have a prior hypothesis that they are (approximately) equally likely. We are testing that hypothesis.

Dale · Dec 6, 2019

PeroK said:

If you gave me some data that read ##XXXXXXY## and you asked me to estimate the probability of getting ##X## or ##Y##, then (if forced to give an answer) I would say ##6/7## for ##X##.

Yes. This is roughly the way that frequentist statistics would do it. I think the “official” process would be a maximum likelihood estimator, but that is probably close.

Dale · Dec 6, 2019

PeterDonis said:

OTOH, if the frequentist claim @PeroK is making is right, then there ought to be some way of reflecting the difference in the Bayesian calculation as well. But I can't come up with one.

Well, the calculation that he is making is not an estimate of ##\lambda##. I think that the frequentist estimate of ##\lambda## would be the same for both couples. What would differ is the p value.

Since the p value isn’t part of Bayesian statistics the fact that it distinguishes between the two couples may not have a Bayesian analog. I am pretty sure that both Bayesian and frequentist methods would treat both couples identically for a point estimate of ##\lambda##.

PeterDonis · Dec 6, 2019

Dale said:

This is roughly the way that frequentist statistics would do it.

It is also the way that Bayesian statistics would do it, is it not, in the (extreme) case @PeroK describes where there is literally no prior data? In that case, a Bayesian would use a maximum entropy prior, which basically means that your posterior after the first set of data is whatever the distribution of that data set is.

PeterDonis · Dec 6, 2019

Dale said:

I think that the frequentist estimate of ##\lambda## would be the same for both couples. What would differ is the p value.

But the p-value affects our confidence level in the estimate, correct? So the confidence levels would be different for the two couples.

Dale said:

Since the p value isn’t part of Bayesian statistics the fact that it distinguishes between the two couples may not have a Bayesian analog.

If it is correct that our confidence level in the estimate should be different for the two couples, I would certainly expect there to be some way to reflect that in a Bayesian calculation.

PeterDonis · Dec 6, 2019

Dale said:

the calculation that he is making is not an estimate of ##\lambda##.

Again, "estimate ##\lambda##" might not be the right way to express what I was asking in the OP. I did not intend the OP to be interpreted narrowly, but broadly.

Perhaps a better way to broadly express the OP question would be: there is obviously a difference between the two couples, namely, that they used different processes in their child-bearing process. Given that the two data sets they produced are the same, are there any other differences that arise from the difference in their processes, and if so, what are they? (We are assuming, as I have said, that there are no other differences between the couples themselves--in particular, we are assuming that ##\lambda## is the same for both.)

So far I have only one difference that has been described: the p-values are different. Are there others? And what, if any, other implications does the difference in p-values have? Does it mean we should have different posterior beliefs about ##\lambda##?

Dale · Dec 6, 2019

PeterDonis said:

In that case, a Bayesian would use a maximum entropy prior, which basically means that your posterior after the first set of data is whatever the distribution of that data set is.

Most treatments of this type of problem that I have seen would use a Beta distribution since it is a conjugate prior. So you would get ##\lambda \sim Beta(2,7)## for the posterior for both cases separately or ##\lambda \sim Beta(4,14)## if you were pooling the data for an overall estimate.

https://www.physicsforums.com/threa...from-bayesian-statistics.973377/#post-6193429

Dale · Dec 6, 2019

PeterDonis said:

But the p-value affects our confidence level in the estimate, correct? So the confidence levels would be different for the two couples.

Frequentist confidence intervals will be different between the two couples, and Bayesian credible intervals will be different from either of those. But as far as I know Bayesian credible intervals will be the same for both couples. That is precisely the advantage of Bayesian methods highlighted in the paper I cited earlier. This is, in fact, a fundamental difference between the methods.

PeterDonis said:

Again, "estimate ##\lambda##" might not be the right way to express what I was asking in the OP. I did not intend the OP to be interpreted narrowly, but broadly

Well, the narrow question is clear and can be answered. I am not sure that the broad question is sufficiently well defined to be answerable.

PeterDonis · Dec 6, 2019

Dale said:

That is precisely the advantage of Bayesian methods highlighted in the paper I cited earlier.

Why is it an advantage? Why are Bayesian credible intervals right and frequentist confidence intervals wrong?

Dale · Dec 6, 2019

PeterDonis said:

So far I have only one difference that has been described: the p-values are different. Are there others? And what, if any, other implications does the difference in p-values have? Does it mean we should have different posterior beliefs about λλ\lambda?

I do not think that the fact that there are different p-values does or should mean that our posteriors should be different.

PeterDonis · Dec 6, 2019

Dale said:

I do not think that the fact that there are different p-values does or should mean that our posteriors should be different.

Why not? (This is basically the same question I asked in post #82.)

Dale · Dec 6, 2019

PeterDonis said:

Why is it an advantage? Why are Bayesian credible intervals right and frequentist confidence intervals wrong?

(this is not really on topic for the thread, but you asked and it is a topic that I am somewhat passionate about, so ...)

It isn’t about right or wrong. It is about economics and professional ethics.

Because p-values depend on your intentions if you take previously studied data and run more tests on that data then you alter the previously reported p-values. Such analyses reduce the significance of previous results. This means that, in principle, you can always make any result non-significant simply by intending to study the data more.

The result of this statistical fact is that scientists need to avoid analyzing previously reported data. In some fields using previously reported data is considered grounds for rejecting a paper. This basically makes scientific data “disposable”, you use it once and then throw it away.

There is no need to treat data this way any more. This “disposable-ness” is not inherent to data nor to science, it is purely a result of the widely used frequentist statistical tools.

Frankly, for publicly funded research this is a travesty. The taxpayers payed good money to purchase that data and scientists use it once and then throw the data into the trash simply because they have not informed themselves about Bayesian statistics. If they had informed themselves then future researchers could reuse the data, making the tax money go further.

It seems like the ethically responsible way to handle the public treasury is to study any collected data as thoroughly as possible, but this intention makes any frequentist test non significant. That is why this specific feature of Bayesian statistics is an advantage.

You will notice that very large collaborations with very expensive data are turning more and more to Bayesian methods. So I think there is a growing awareness of this issue.

PeroK · Dec 6, 2019

PeterDonis said:

Again, "estimate ##\lambda##" might not be the right way to express what I was asking in the OP. I did not intend the OP to be interpreted narrowly, but broadly.

Perhaps a better way to broadly express the OP question would be: there is obviously a difference between the two couples, namely, that they used different processes in their child-bearing process. Given that the two data sets they produced are the same, are there any other differences that arise from the difference in their processes, and if so, what are they? (We are assuming, as I have said, that there are no other differences between the couples themselves--in particular, we are assuming that ##\lambda## is the same for both.)

So far I have only one difference that has been described: the p-values are different. Are there others? And what, if any, other implications does the difference in p-values have? Does it mean we should have different posterior beliefs about ##\lambda##?

This probably only makes sense if we allow a second parameter - for example that some couples have a predisposition for children of the one sex. Otherwise, there no reason to doubt the general case.

Unless we allow the second parameter, all we are doing is picking up unlikely events. We can calculate the probability of these events, but unless we allow the second parameter, that is all we can say.

My calculations show that the second family is less likely (more of an anomaly) than the first, but this has no effect on the overall average. Assuming we have enough prior data. Which we do.

What this data does question is the hypothesis that no couples have a predispostion to the one sex or other of their children.

In other words, if a family has ten children, all girls say; then, I don't think this influences the overall mean for girls in general. In fact, even if you adjusted the mean to ##0.6## (which still leaves 10 girls in a row very unlikely), you've created the hypothesis that 60% of children should be girls. Which is absurd. You can't shift the mean from ##0.5## (or whatever it is - I believe it's not quite that) on the basis of one family.

What it does is raise the question about a predisposition to girls in that family. In the extreme case of, say, 50 girls in a row, then

1) That does not affect the overall mean to any extent.

2) It implies that it is almost certain that the data itself could not have come from the assumed distribution. I.e. that family is not producing children on a 50-50 basis.

In summary, to make this a meaningful problem I think you have to add another parameter. Then it reduces to the standard problem where you count the false positives (couples who do produce children 50-50, but who happen to have a lot of one sex) and count the true positives (couples who are genetically more likely to have one sex). Then, you can calculate ##p(A|B)## and ##p(B|A)## etc. (*)

As it stands, to clarify all my posts hitherto, all we can do is calculate how unlikely each of these families is under the hypothesis that in general ##\lambda = 0.5##. Nothing more. Confidence interval calculations cannot be done because of the assumed overwhelming prior data.

(*) PS although we still have to be aware of the sampling pitfalls.

PPS Maybe the Bayesians can do better.

PeterDonis · Dec 6, 2019

Dale said:

p-values depend on your intentions

This might be an issue in general, but it is not in the particular scenario we are talking about here. The p value depends on the process used to generate the data, but that process is an objective fact about each couple; it is not a matter of the intentions of third parties studying the data.

Dale · Dec 6, 2019

PeterDonis said:

This might be an issue in general, but it is not in the particular scenario we are talking about here.

Yes, in fact it is the key issue. The only difference between the couples was their intentions. Frequentist methods are sensitive to the intentions of the experimenters as well as the analysts. Did you read the paper? It covers both.

PeterDonis · Dec 6, 2019

PeroK said:

This probably only makes sense if we allow a second parameter - for example that some couples have a predisposition for children of the one sex. Otherwise, there no reason to doubt the general case.

What is "the general case"? We are assuming for this discussion that there is no second parameter--p is the same for all couples.

If by "the general case" you mean ##p = 0.5## (or ##\lambda = 0.5## in @Dale's notation), then the actual evidence is that this is false; the global data seems to show a value of around ##0.525## to ##0.53##.

https://en.wikipedia.org/wiki/Human_sex_ratio

PeroK said:

What this data does question is the hypothesis that no couples have a predispostion to the one sex or other of their children.

Yes, but does it question it to a different extent for couple #2 vs. couple #1? Does their different choice of process make a difference here?

PeterDonis · Dec 6, 2019

PeroK said:

all we can do is calculate how unlikely each of these families is under the hypothesis that in general ##\lambda = 0.5##. Nothing more

This seems way too pessimistic. We can calculate probabilities and p-values and likelihood ratios for any value of ##\lambda## we like. The math might be more difficult, but that's what computers are for.

PeroK · Dec 6, 2019

PeterDonis said:

What is "the general case"? We are assuming for this discussion that there is no second parameter--p is the same for all couples.

If by "the general case" you mean ##p = 0.5## (or ##\lambda = 0.5## in @Dale's notation), then the actual evidence is that this is false; the global data seems to show a value of around ##0.525## to ##0.53##.

Yes, but does it question it to a different extent for couple #2 vs. couple #1? Does their different choice of process make a difference here?

Yes, I know it's not really ##0.5##. That just makes the calculations a bit harder and asymmetrical.

The main difference is that the distribution of families are different.

Case #1 has families all with seven children (i.e. families who set out with that policy always end up with seven children).

Case #2 has families with two children upwards.

This creates an asymmetry that gets picked up in the calculations. The simple calculations I've done above. But also, if we did add another parameter, it may well be reflected there also.

For example, my guess would be that the second family would be more likely to be one of the predisposed couples than the first. I could run an example tomorrow to check this, but I think I can see how the calculations will come out.

PeterDonis · Dec 6, 2019

Dale said:

The only difference between the couples was their intentions.

The intentions of the couples, not the researchers (us) who are evaluating the data. The p-value hacking issue is an issue about the intentions of the researchers.

However, I can see an argument here regarding the intentions of the couples: the gametes don't know at each conception what rule the parents were using to decide when to stop having children. So there is a straightforward argument from the biological facts of conception that the process the parents are using to decide when to stop having children should not affect the data.

This is still not quite the same as saying that the p-value we calculate should not matter, but I can see a further argument: saying that the p-value matters is equivalent to saying that the data from couple #2 is being drawn from a different underlying distribution of births than the data from couple #1. But these underlying distributions are theoretical constructs in the minds of the researchers; they don't correspond to anything in the real world that actually affects the data. The only thing in the real world that they correspond to is the couple's intentions, and we just saw above that the couple's intentions don't affect the data.

PeroK · Dec 6, 2019

PeterDonis said:

However, I can see an argument here regarding the intentions of the couples: the gametes don't know at each conception what rule the parents were using to decide when to stop having children. So there is a straightforward argument from the biological facts of conception that the process the parents are using to decide when to stop having children should not affect the data.

I think this is the sort of argument to avoid. You need to calculate what is implied by the assumptions in the problem and what is being compared to what.

In this case, certain things had to happen in order for a case #2 family to end up with seven children. That's the sort of detail that can trip you up.

PeterDonis · Dec 6, 2019

PeroK said:

my guess would be that the second family would be more likely to be one of the predisposed couples than the first

I have not done a Bayesian calculation with ##\lambda## treated as a function of the individual couple instead of an unknown single parameter, but it seems to me that such a calculation would still say that, since the data sets of both couples are the same, our posterior distribution over whatever parameters we are estimating will be the same. The key here is that the difference we have information about for the two couples--the way they choose to decide when to stop having children--has no relationship that I can see between any difference between them that would be expected to be relevant to a difference in ##\lambda## between the two couples.

In fact, even if we discount the subjective judgment I just expressed, and decide to test the hypothesis that "there is some difference between these two couples that affects ##\lambda##", the fact that the two data sets are identical is evidence against any such hypothesis!

PeroK · Dec 6, 2019

PeterDonis said:

I have not done a Bayesian calculation with ##\lambda## treated as a function of the individual couple instead of an unknown single parameter, but it seems to me that such a calculation would still say that, since the data sets of both couples are the same, our posterior distribution over whatever parameters we are estimating will be the same. The key here is that the difference we have information about for the two couples--the way they choose to decide when to stop having children--has no relationship that I can see between any difference between them that would be expected to be relevant to a difference in ##\lambda## between the two couples.

In fact, even if we discount the subjective judgment I just expressed, and decide to test the hypothesis that "there is some difference between these two couples that affects ##\lambda##", the fact that the two data sets are identical is evidence against any such hypothesis!

I'll do a calculation tomorrow! It's after midnight here.

PeterDonis · Dec 6, 2019

PeroK said:

certain things had to happen in order for a case #2 family to end up with seven children.

And the same is true of couple #1. The fact that they decided in advance to have seven children does not mean they were guaranteed to succeed. The wife could have died in childbirth, or one of them could have become infertile, or...

The point is that none of these things have any connection to the process they decided to use. Or, if you don't like such absolute language, then in Bayesian terms, hypotheses along the lines of "couples who choose the process that couple #2 chose are more likely to have the wife die in childbirth than couples who choose the process that couple #1 chose" have such tiny prior probabilities that it doesn't even make sense to consider them when there are hypotheses in view with prior probabilities many orders of magnitude larger.

Dale · Dec 6, 2019

PeterDonis said:

The intentions of the couples, not the researchers (us) who are evaluating the data. The p-value hacking issue is an issue about the intentions of the researchers.

No, it is about the experimenters as well as the analysts. The couples are experimenters since they had an experiment with a stopping criterion and collected data. You really should read the paper.

PeterDonis · Dec 6, 2019

PeroK said:

You need to calculate what is implied by the assumptions in the problem and what is being compared to what.

What I said about gametes is just as much implied by the assumptions in the problem as speculating about mishaps that could prevent a couple from getting to seven children. So I don't see that this (valid) point helps us much either way.

PeterDonis · Dec 6, 2019

Dale said:

The couples are experimenters since they had an experiment with a stopping criterion and collected data.

Fair enough.

Dale · Dec 6, 2019

PeroK said:

Unless we allow the second parameter, all we are doing is picking up unlikely events. We can calculate the probability of these events, but unless we allow the second parameter, that is all we can say.
...
In summary, to make this a meaningful problem I think you have to add another parameter.

Interestingly, there is an approach called hierarchical Bayesian modeling which does exactly that.

Here is a paper where they add this additional parameter (a Bayesian hierarchical model for binomial data) in the context of polling:

http://www.stat.cmu.edu/~brian/463-663/week10/Chapter 09.pdf

In this model each poll is considered to have some underlying probability of a win (analogous to a couple's probability of having a boy) which is considered a "hyperparameter", then the respondents to the poll are binomial draws from the prior (analogous to each child being a draw from the couple's probability). The observed data then informs us both about the probability for each couple as well as the distribution of probabilities for the population. The major difference being that there are a small number of polls each with a relatively large number of samples while there are a large number of couples each with a relatively small number of children.

PeterDonis · Dec 6, 2019

Dale said:

In this model each poll is considered to have some underlying probability of a win (analogous to a couple's probability of having a boy) which is considered a "hyperparameter", then the respondents to the poll are binomial draws from the prior (analogous to each child being a draw from the couple's probability). The observed data then informs us both about the probability for each couple as well as the distribution of probabilities for the population.

Hm, interesting! If I'm understanding this correctly, this methodology could provide a way of investigating questions like "does ##\lambda## depend on the criterion the couple uses to decide when to stop having children" by simply grouping the couples by that criterion--i.e., assuming that the same hyperparameter value applies to all couples in a group, but can vary between groups--and seeing whether the posterior distribution for the hyperparameter does in fact vary from group to group. And as I commented earlier, it would seem like the evidence described in the OP, where two couples are from different groups but produce the same outcome data, would be evidence against any hypothesis that the hyperparameter varied from group to group.

Dale · Dec 6, 2019

PeterDonis said:

this methodology could provide a way of investigating questions like ...

Yes, you could do it that way. The details vary a little if you want to consider only these two stopping criteria or if you want to consider them as elements of a whole population of stopping criteria. The hierarchical model is more appropriate for the second case. Essentially this is the difference between a fixed effect and a random effect model.

PeterDonis said:

the evidence described in the OP ... would be evidence against any hypothesis that the hyperparameter varied from group to group

Yes

StoneTemplePython · Dec 7, 2019

PeterDonis said:

One way of rephrasing the question is whether and under what circumstances changing the stopping rule makes a difference. In particular, in the case under discussion we have two identical data sets that were collected under different stopping rules; the question is whether the different stopping rules should affect how we estimate the probability of having a boy given the data.

I won't weigh in on variance issues, but the long-run estimates for the probability of boy vs girl are the same with either strategy. (Mathematically its via use of Strong Law of Large Numbers, but in the real world we do have tons of data on demographics spanning many years which should give pretty good estimates) .

inspection paradox related items:

if you estimate/sample by children:
we should be able to see that our estimates are the same either way -- i.e. in all cases the modelling is a sequence of one child at a time (we can ignore zero probability events of exactly the same time of birth so there is a nice ordering here) and each child birth is bernouli trial -- a coin toss with probability of heads given by some parameter ##p##. Depending on "strategy" taken what may change is who is tossing the coin (parents) but it doesn't change the fact that in this model we have bernouli process where the tosser/parent is irrelevant for modelling purposes.

if you estimate/sample by parents/couples:
this one is a bit more subtle.

PeterDonis said:

This is not the correct stopping rule for couple #2. The correct stopping rule is "when there is at least one child of each gender". It just so happens that they had a boy first, so they went on until they had a girl. But if they had had a girl first, they would have gone on until they had a boy.

I evidently misread the original post. Given this structure I opted to view it as a baby markov chain (pun intended?), and use renewal rewards.

for strategy #2 we have a sequence of ##X_k## iid random variables -- where ##X_k## denotes the number of kids given by parent/couple ##k##.

Part 1) give a reward of 1 for each girl the couple k has, with probability ##p \in (0,1)##
direct calculation (using total expectation) gives
##E\big[R_k\big] = \frac{1-p + p^2}{1-p}##
Part 2) give a reward of 1 for each boy the couple k has, with probability ##1-p##
either mimicking the above calculation, or just changing variables we get
##E\big[R_k'\big] = \frac{(1-p)^2 + p}{p}##

and the total time (i.e. number of kids) per couple k is
##E\big[X_k\big] = E\big[R_k + R_1'\big] = E\big[R_k\big] + E\big[R_1'\big]##

with R(t) as the reward function (t = integer time by custom = total number of kids in our model)
##\frac{R(t)}{t} \to_{as} \frac{E\big[R_k\big]}{E\big[X_k\big]}= p##
##\frac{E[R(t)]}{t} \to \frac{E\big[R_k\big]}{E\big[X_k\big]} = p##
where wolfram did the simplifications
https://www.wolframalpha.com/input/?i=((++(1-p)+++p^2)/(1-p))/(+(++(1-p)+++p^2)/(1-p)+++((1-p)^2+++p)/p)

I suppose the result may seem obvious to some, but a lot of things that are 'obviously true', actually aren't true in probability, which is why there are many so called paradoxes in probability. (The 'paradox paradox' of course tells us that they aren't really paradoxes, just a mismatch between math and intuition.) E.g. in the above, taking the expectation of X in the denominator can break things if we don't have justification-- this is why I used Renewal Rewards theorem here.

We can apply the same argument to strategy one to see an expected reward of ##E\big[R_k] = 7\cdot p## and ##E\big[R_k'] = 7\cdot (1-p)## so, yes this too tends to ##p##

PeterDonis said:

Can you give examples of each of the two possibilities you describe? I.e, can you give an example of a question, arising from the scenario described in the OP, for which stopping rules don't matter? And can you give an example of a question for which they matter a lot?

I can try... it's an enourmously complex and broad question in terms of math, and then more so when trying to map these approximations to the real world. A classical formulation for martingales and random walks is in terms of gambling. The idea behind martingales is with finite dimensions a fair game stays fair, and a skewed game stays skewed, no matter what 'strategy' the better has in terms of bet sizing. With infinite dimensions all kinds of things can happen and a lot of care is needed -- you can even have a formally fair game with finite first moments but if you don't have variance (convergence in L2/ access to Central Limit Theorem) then extremely strange things can happen -- Feller vol 1 has a nice example of this (chapter 10, problem 15 in the 3rd edition).

With respect to you original post, I've shown that neither 'strategy' changes the long-run estimates of ##p##. The fact that both strategies not only have second moments, but valid moment generating functions should allow for concentration inequalities around the mean, which can show that the mean convergence isn't 'too slow', but this is outside the scope I think.
- - - -
For an explicit example / model:
As far as simple models and examples go, I suggest considering the simple random walk, where we move to the left with probability q = 1-p and to the right with probability p. Suppose we start at zero and have a stopping rule of "stop when we're ahead" i.e. once the net score is +1. for ##p \in [0,\frac{1}{2})##, our random variable ##T## for number of moves until stopping is defective (i.e. not finite WP1), which is problematic. For ##p=\frac{1}{2}## the process stops With Probability 1, but ##E\big[T\big] = \infty## which is problematic = (e.g. see earlier comment on wanting to have finite 2nd moment...). Now for ##p \in (\frac{1}{2}, 1]## from a modelling standpoint, things are nice, but is this "ok"? Well it depends on what we're looking into. This admittedly very simple model could be used to interpret a construct for a (simplified) pharmaceutical trial -- say if they used the stopping rule: stop when the experimental evidence looks good (e.g. when they're ahead). The result would be to only publish favorable results even if the drug's effects were basically a standard coin toss (and possibly with significant negative side effects "when they're behind"). When things go bad, the results wouldn't be reported as the trial would be ongoing or maybe they'd stop funding it and it would just show up as 'no valid trial as terminated before proper finish (stopping rule)'

it reminds me a bit of this
https://www.statisticsdonewrong.com/regression.html
which has some nice discussion under 'truth inflation' that seems germane here
- - - -
edit: thanks to Fresh for resetting a latex/ server bug

Dale · Dec 11, 2019

PeroK said:

What does a Bayesian analysis give numerically for the data in post #1?

So, the easiest way to do this analysis is using conjugate priors. As specified by @PeterDonis we assume that both couples have the same ##\lambda##. Now, in Bayesian statistics you always start with a prior. A conjugate prior is a type of prior that will have the same distribution as the posterior. In this case the conjugate prior is the Beta distribution. If these were the first two couples that we had ever studied then we would start with an ignorant prior, like so:

After observing 12 boys and 2 girls we would update our beliefs about the distribution of ##\lambda## from the Beta(1,1) prior to a Beta(3,13) posterior distribution, like so:

From that posterior we can calculate any quantity we want regarding ##\lambda##. For example, the mean is 0.81 with a 95% Bayesian confidence region from 0.60 to 0.96 and a median of 0.83 and a mode of 0.86. This confidence region should be close to the frequentist confidence interval.

Now, suppose that we did not want to pretend that this is the first couple that we had ever seen. We can incorporate the knowledge we have from other couples in the prior. That is something that cannot be done in frequentist statistics. Remember, ##\lambda## is not the proportion of boys in the overall population, it is the probability of a given couple producing boys. While the overall proportion of boys in the population is close to 0.5, individual couples can be highly variable. I know several couples with >80% girls and several with >80% boys, but we don't know if they would have started having more of the other gender if they continued. So let's set our prior to be symmetric about 0.5 and have 90% of the couples within the range ##0.25<\lambda<0.75##. This can be achieved with an informed Beta(5,5) prior.

Now, after collecting data of 6 boys and 1 girl for each couple we find the posterior distribution is Beta(7,17), which leads to a lower estimate of the mean ##\lambda## of 0.71 with a 95% confidence region from 0.52 to 0.87.

Notice that the mean is substantially lower because we are informed by the fact that we have seen other couples before. When couples have a unusual ratio we automatically suspect random chance may be skewing the results a bit, but do admit that there is some possibility that there is something different with this couple so that the results are not totally random. The informed posterior shows that balanced assessment.

PeterDonis · Dec 11, 2019

Dale said:

we assume that both couples have the same ##\lambda##.

This doesn't seem to be quite what you're assuming. As you describe your analysis, you're not assuming that ##\lambda## is fixed for all couples; you're allowing for the possibility that different couples might have different unknown factors at work that could affect their respective probabilities of producing boys. But you are assuming that we have no reason to suppose that either couple #1 or couple #2 in our example is more or less likely to have unknown factors skewing them in one direction or the other, so we should use the same prior distribution (the "informed prior" Beta distribution) for both. I think that way of looking at it is fine.

Dale said:

When couples have a unusual ratio we automatically suspect random chance may be skewing the results a bit, but do admit that there is some possibility that there is something different with this couple so that the results are not totally random.

But, more importantly, the posterior distribution is the same for both couples, since they both have the same data. The different choice of stopping criterion does not affect the posterior distribution. In terms of the way of looking at it that I described above, we are assuming that a couple's choice of stopping criterion is independent of any unknown factors that might affect their propensity for favoring one gender over the other in births.

Does the statistical weight of data depend on the generating process?

Similar threads

Hot Threads

Recent Insights