Does the statistical weight of data depend on the generating process?

Dale · Dec 11, 2019

PeterDonis said:

But, more importantly, the posterior distribution is the same for both couples, since they both have the same data. The different choice of stopping criterion does not affect the posterior distribution

Yes, the stopping criterion does not affect our retrospective belief about that couple's ##\lambda##, provided we use the same prior for both couples. Theoretically there could be reasons to use different priors for the two couples, but for this scenario all such reasons seem pretty far-fetched.

PeroK · Dec 11, 2019

PeterDonis said:

But, more importantly, the posterior distribution is the same for both couples, since they both have the same data. The different choice of stopping criterion does not affect the posterior distribution. In terms of the way of looking at it that I described above, we are assuming that a couple's choice of stopping criterion is independent of any unknown factors that might affect their propensity for favoring one gender over the other in births.

After some calculations, I agree with this. If we assume that there are some couples who are more likely to have girls than boys, say, then the conditional probability that each couple is in that category, given the data, is the same in both cases.

It appears that in general the stopping criteria are indeed irrelevant.

Dale · Dec 11, 2019

PeroK said:

It appears that in general the stopping criteria are indeed irrelevant.

They are irrelevant for determining the estimate of ##\lambda##, but not for determining the p-value, as you calculated somewhere back on the first page.

PeroK · Dec 11, 2019

Dale said:

They are irrelevant for determining the estimate of ##\lambda##, but not for determining the p-value, as you calculated somewhere back on the first page.

I can patch that up! First, because of the asymmetry in the data, we should take the p-value as strictly more extreme than the data.

In case 1, we need the probability of either 7 boys or 7 girls. That's ##\frac{1}{64}##.

In case 2, I also misread the question and assumed they were waiting for a girl, rather than wanting at least one of each. The probability of having a family of more than 7 is ##\frac{1}{64}##.

The p-values match.

The mistake was that the exactly observed data was less likely in the second case, but because I was measuring numbers of boys against size of family, this created an asymmetry. There was no exact correspondence in what was observed. What I should really have calculated was the probability of getting up to six boys or girls against the probability of having a family size up to 7. I.e. the complement of strictly more extreme outcome, as above.

(This must be a general point to be aware of: if you can't match up the data exactly, you need to take the strictly more unlikely outcomes for the p-value.)

But, there has to be a twist! Suppose that the second family were, indeed, waiting for a girl. Now, the likelihood of a family of more than 7 is only ##\frac{1}{128}##. And, again there is a difference in p-values.

This may be a genuine case where the stopping criteria does make a difference (*).

(*) PS As Peter points out below, this is just a case of limiting it to a one-tailed scenario.

PeterDonis · Dec 11, 2019

PeroK said:

we should take the p-value as strictly more extreme than the data.

"Strictly more extreme" is ambiguous, though. Does it mean "one-tailed" or "two-tailed"? In this case, does it mean "at least that many boys" or "at least that many children of the same gender"?

This doesn't affect whether the p-values are the same or not, but it does affect their actual numerical values. I'll assume the "one-tailed" case in what follows.

PeroK said:

The p-values match.

I don't think they do.

For couple #1, the sample space is all possible combinations of 7 children, and the "at least as extreme" ones are those that have at least 6 boys. All combinations are equally probable so we can just take the ratio of the total numbers of each. There are ##2^7## of the former and 8 of the latter (one with 7 boys and 7 with 6 boys), so the p-value is ##8 / 2^7 = 1/16##.

For couple #2, the sample space is all possible combinations of 2 or more children that have at least one of each gender; but the combinations are not all equally probable so we would have to take that into account if we wanted to compute the p-value using a ratio as we did for couple #1 above. However, an easier way is to just compute the probability of getting 6 boys in a row, which is just ##1 / 2^6 = 1/64##. This covers all combinations at least as extreme as the one observed--half of that 1/64 probability is for the combination actually observed (6 boys and 1 girl), and the other half covers all the other possibilities that are at least as extreme, since all of them are just some portion of the combinations that start with 7 boys. So the p-value is ##1/64##.

PeroK said:

there has to be a twist! Suppose that the second family were, indeed, waiting for a girl.

Since they started out having a boy as their first child, they are waiting for a girl. Or are you considering a case where the stopping criterion is simply "stop when the first girl is born"? For that case, the p-value would be the same as the one I computed above for couple #2; the difference is that the underlying sample space is now "all combinations that end with a girl", which means that if you tried to compute the p-value using ratios, as I did for couple #1 above, you would end up computing a different set of combinations and a different set of associated probabilities.

The other twist in this case is that there is no "two-tailed" case, since the stopping criterion is not symmetric between genders. So you could say that the p-value for this case is different from both of the ones I computed above if you converted the ones I computed above to the two-tailed case (which means multiplying by 2).

PeroK said:

This may be a genuine case where the stopping criteria does make a difference.

It can make a difference in p-value, yes, as shown above.

However, it still doesn't make a difference in the posterior distribution for ##\lambda##, or, in your terms, the conditional probability of each couple being in a particular category as far as a propensity for having boys or girls.

Dale · Dec 11, 2019

Just for grins I also did a Monte Carlo simulation of the original problem. I assumed ##\lambda## starting at 0.01 and going to 0.99 in increments of 0.01. For each value of ##\lambda## I simulated 10000 couples using each stopping criterion. I then counted the number of couples that had 6 exactly boys. The plots of the counts are as follows. For the case where they stop after exactly 7 children regardless:

For the case where they stop after they get one of each

Notice that the shape is the same for both strategies, this is why the fact that we get the same data leads to the same estimate of ##\lambda##. However, note that the vertical scale is much different, this is why the probabilities are different for the two cases, it is simply much less likely to get 6 boys if trying for 1 of each than it is to get 6 boys if you simply have 7 children. This doesn't make the estimate any different, but it makes us more surprised to see the data.

PeroK · Dec 11, 2019

PeterDonis said:

"Strictly more extreme" is ambiguous, though. Does it mean "one-tailed" or "two-tailed"? In this case, does it mean "at least that many boys" or "at least that many children of the same gender"?

This doesn't affect whether the p-values are the same or not, but it does affect their actual numerical values.

I assumed two-tailed.

You can see that

##p(7-0, 0-7) = p(n \ge 8)##

Where that's the total probablity of a unisex family of seven on the left and a family size of eight or more to get at least one of each sex. But:

##p(6-1, 1-6) \ne p(n =7)##

Which creates another interesting ambiguity. Is that genuinely a difference in p-values or just an asymmetry in the possible outcomes?

PeroK · Dec 11, 2019

PS if the p-values for two sets of data cannot be the same because of the discrete structure of the data, then having different p-values loses some of its significance!

PeroK · Dec 11, 2019

I did a few calculations for the cases of different sizes of families. There is a clear pattern. The "strict" p-value agrees in all cases. But, the "inclusive" p-value becomes more different as the size of the family increases. This is all two-tailed:

For a family of size ##N##, the strict p-value (the probability of the data being more extreme) is ##\frac{1}{2^{N-1}}## for both case 1 and case 2.

For the "inclusive" p-values (the data being as observed or more extreme), the p-values are:

##N = 4, \ p_1 = 5/8, \ p_2 = 1/4##
##N = 5, \ p_1 = 3/8, \ p_2 = 1/8##
##N = 6, \ p_1 = 7/32, \ p_2 = 1/16##
##N = 7, \ p_1 = 1/8, \ p_2 = 1/32##
##N = 8, \ p_1 = 9/128, \ p_2 = 1/64##

There's a clear pattern: ##p_2 = \frac{1}{2^{N-2}}## and ##p_1 = \frac{N+1}{2} p_2##.

This raises an interesting question about whether the p-value should be "strict" or "inclusive". In this problem, there is a case for choosing the strict version. Which reflects the fact that, after all, the data is the same.

Alternatively, the fact that the (inclusive) p-value in case 2 is lower for larger ##N## might be telling us something statistically significant.

PeterDonis · Dec 11, 2019

PeroK said:

This raises an interesting question about whether the p-value should be "strict" or "inclusive".

The "inclusive" p-value is different for case #1 vs. case #2 because the number of combinations that are equally extreme as the one actually observed is different for the two cases; whereas, in this particular case, the number of combinations which are more extreme happens to be the same for both case #1 and case #2. I don't think either of those generalizes well.

PeroK said:

the fact that the (inclusive) p-value in case 2 is lower for larger ##N## might be telling us something statistically significant

It's telling you that, as ##N## goes up, the number of combinations that are equally extreme as the one actually observed increases for case #1, whereas for case #2 it remains constant (it's always just 2 combinations, the one actually observed and its counterpart with boys and girls interchanged).

However, the more fundamental point is that, no matter how you slice and dice p-values, they are answers to a different question than the question I posed in this thread. They are answers to questions about how likely the observed data are given various hypotheses. But the question I posed is a question about how likely various hypotheses are given the observed data. In most real-world cases, the questions we are actually interested in are questions of the latter type, not the former. For those kinds of questions, the Bayesian viewpoint seems more appropriate.

WWGD · Dec 13, 2019

PeterDonis said:

Summary:: If we have two identical data sets that were generated by different processes, will their statistical weight as evidence for or against a hypothesis be different?

The specific example I'm going to give is from a discussion I am having elsewhere, but the question itself, as given in the thread title and summary, is a general one.

We have two couples, each of which has seven children that, in order, are six boys and one girl (i.e., the girl is the youngest of the seven). We ask the two couples how they came to have this set of children, and they give the following responses:

Couple #1 says that they decided in advance to have seven children, regardless of their genders (they think seven is a lucky number).

Couple #2 says that they decided in advance to have children until they had at least one of each gender (they didn't want a family with all boys or all girls).

Suppose we are trying to determine whether there is a bias towards boys, i.e., whether the probability p of having a boy is greater than 1/2. Given the information above, is the data from couple #2 stronger evidence in favor of such a bias than the (identical) data from couple #1?

Sorry if this was brought up already but isn't something similar done in medicine with likelihood ratios, using a database of priors and adjusting? Then you can decide , assuming equal priors I guess, if the likelihood ratio is the same in both cases?

EDIT: e.g., given symptoms A,B,C, etc. and a given age, there is a certain prior attached and then tests are given whose results have a likelihood ratio to them. Wonder if something similar can be made with your question, seeing if one has a higher likelihood ratio than the other?

Dale · Dec 13, 2019

PeterDonis said:

the question I posed is a question about how likely various hypotheses are given the observed data. In most real-world cases, the questions we are actually interested in are questions of the latter type, not the former

That was actually the first thing that drew my attention and interest in Bayesian statistics. The outcome of Bayesian tests are more aligned with how I personally think of science and scientific questions. Plus, it naturally and quantitatively incorporates some philosophy of science in a non-philosophical way, specifically Popper’s falsifiability and Ockham’s razor.

WWGD · Dec 13, 2019

Dale said:

That was actually the first thing that drew my attention and interest in Bayesian statistics. The outcome of Bayesian tests are more aligned with how I personally think of science and scientific questions. Plus, it naturally and quantitatively incorporates some philosophy of science in a non-philosophical way, specifically Popper’s falsifiability and Ockham’s razor.

Other than Bayes' theorem, does modern Probability, Math Statistics deal with Bayesian stats or just frequentist? EDIT: The type you would study in most grad courses that are not explicitly called frequentist which includes the CLT, LLN, etc.

PeterDonis · Dec 13, 2019

WWGD said:

isn't something similar done in medicine with likelihood ratios, using a database of priors and adjusting?

The results of medical tests for rare conditions are usually much better analyzed using Bayesian methods, yes, because those methods correctly take into account the rarity of the underlying condition, in relation to the accuracy of the test. Roughly speaking, if the condition you are testing for is rarer than a false positive on the test, any given positive result on the test is more likely to be a false positive than a true one. Frequentist methods don't give you the right tools for evaluating this.

Dale · Dec 13, 2019

WWGD said:

The type you would study in most grad courses that are not explicitly called frequentist

My classes were all purely frequentist, but I am an engineer that likes statistics rather than a statistician and also school was more than a decade ago. (Significantly more, even with a small sample)

PeroK · Dec 14, 2019

PeterDonis said:

The results of medical tests for rare conditions are usually much better analyzed using Bayesian methods, yes, because those methods correctly take into account the rarity of the underlying condition, in relation to the accuracy of the test. Roughly speaking, if the condition you are testing for is rarer than a false positive on the test, any given positive result on the test is more likely to be a false positive than a true one. Frequentist methods don't give you the right tools for evaluating this.

Peter, you are fairly harsh in the physics forums when nonsense is posted, so there is no reason not to point out that this is nonsense. The vast majority of medical research has used standard statistical analysis, which is based on frequentist methods.

If what you say were true there would have been a mass conversion to Bayesian methods.

I'd like to see a statistical journal where your claims about standard statistical methods being inadequate simply because a test can yield more false positives than true positives is substantiated.

Auto-Didact · Dec 14, 2019

WWGD said:

Sorry if this was brought up already but isn't something similar done in medicine with likelihood ratios, using a database of priors and adjusting? Then you can decide , assuming equal priors I guess, if the likelihood ratio is the same in both cases?

Yes, this is becoming more and more standard practice in medicine. There are not only journals but even undergraduate medical textbooks which directly address such issues as part of the core clinical theory of medicine. This has been this way for at least 20 years and is steadily developing.

However, from my experience of polling undergraduates and graduates, the emphasis on the utility of Bayesian methods is so marginal - both educationally and clinically - that it is practically forgotten by the time rounds begin; older physicians that are not in academia and/or not educators tend to be wholly unfamiliar with these relatively novel methods, so they straight out ignore them.

PeroK said:

Peter, you are fairly harsh in the physics forums when nonsense is posted, so there is no reason not to point out that this is nonsense. The vast majority of medical research has used standard statistical analysis, which is based on frequentist methods.

If what you say were true there would have been a mass conversion to Bayesian methods.

I'd like to see a statistical journal where your claims about standard statistical methods being inadequate simply because a test can yield more false positives than true positives is substantiated.

In medicine, frequentist statistics is only utilized for academic research i.e. generalizing from single instances to entire populations, while Bayesian statistics is used in clinical practice, i.e. specifying from generalities to particular cases. Medicine as clinical practice is purely concerned with the latter, which is why quantitative operationalizations of certain aspects of the medical process such as likelihood ratio analyses have been invented; such purely clinical quantitative methods tend to be Bayesian, i.e. the clinical application of knowledge gained using frequentist statistical methods is Bayesian.

While I get your sentiment you are simply wrong here and your misunderstanding is a widespread one in medicine as well. Moreover, you have misconstrued the actual issue by not qualifying your statement, i.e. the vast majority of medical research focused on comparing treatments and demonstrating effectiveness of treatment have focused on standard statistical analysis. To use the actual terminology, most medical research is quantitative research.

This terminology is extremely misleading because it pretends that standard statistical analysis is the only kind of quantitative research - something which some medical researchers will actually tell you! - which is obviously wrong! See e.g. the difference in mathematical sophistication and background required between 'quantitative finance' and 'finance'; in fact, recognizing this early on is what made me realize I had to take a degree in either applied mathematics or physics in order to learn alternative quantitative and mathematical methods for research in medicine which are completely unknown in medicine.

In any case, the fact that most research in medicine has focused only on the type of question 'does A work/is A better than B' is because practically these are the easiest types of questions to research and answer with little to no uncertainty: in fact, the path is so completely straightforward such that with statistical packages already available all that is practically left to do is just collect data and correctly feed it into the computer. This has transformed both the standard MD/PhD programme as well as the typical PhD programme in medicine into a very straightforward path which can be reduced to mastering standard statistical analysis, but I digress.

Apart from the obviously different kinds of research which require different methods - e.g. laboratory work and sociological analysis - there are of course also other types of quantitative questions that are of direct interest in medicine, both in the scientific as well as the clinical context. The problem for medicine with such quantitative questions is that they do not fit the existing mold i.e. they require alternative quantitative methods that simply aren't taught in the standard medical curriculum; Bayesian likelihood ratio analysis is an exception that is taught.

It is generally recognized by clinicians that alternative quantitative methods however are to some extent taught in other sciences. Because of this many of these alternative quantitative questions are simply directly deferred to other sciences (biomedical sciences, pharmacology, physiology and so on). The problem then remains that the purely clinical questions cannot be deferred to other sciences because they are purely practical medical issues and belong to the domain of the clinical physician. How do clinicians deal with this? They simply ignore it and/or leave it as an issue for the next generation to solve.

PeroK · Dec 14, 2019

Auto-Didact said:

While I get your sentiment you are simply wrong here and your misunderstanding is a widespread one in medicine as well.

Okay, I'm willing to believe this. But, I would like to see some evidence.

I can see the potential for the Bayesian approach. What I don't see is how the standard approach can ultimately fail in general.

Why has everyone (who uses standard statistical analysis) been wrong all along how many people know this?

Auto-Didact · Dec 14, 2019

PeroK said:

Okay, I'm willing to believe this. But, I would like to see some evidence.

I can see the potential for the Bayesian approach. What I don't see is how the standard approach can ultimately fail in general.

Why has everyone (who uses standard statistical analysis) been wrong all along how many people know this?

I've been trying to answer this for over a decade now. If you could answer that convincingly, you'd probably get the Nobel Prize in Medicine.

PeroK · Dec 14, 2019

Auto-Didact said:

I've been trying to answer this for over a decade now. If you could answer that convincingly, you'd probably get the Nobel Prize in Medicine.

Well, I'm not after a Nobel Prize. As far as I can see, it's the traditional camp that is concerned about the reliability of Bayesian methods. Not the other way round.

Dr. Courtney · Dec 14, 2019

Deciding when to stop data collection is an important part of an experimental design to prevent the introduction of bias. My preference is to design experiments from the outset that stop either with a fixed, pre-determined number of data points, or run for a fixed, pre-determined duration of time. It is hard to introduce a human decision to stop data collection once it has begun that is free of bias, especially if the human decision maker(s) are aware of the results so far.

Auto-Didact · Dec 14, 2019

PeroK said:

Well, I'm not after a Nobel Prize. As far as I can see, it's the traditional camp that is concerned about the reliability of Bayesian methods. Not the other way round.

You're of course correct. Apart from the Nobel Prize it is likely that a solution would go a long ways to solving the reproduction crisis and problem with p-value hacking, as these all seem to be symptoms of the same disease, which is precisely why solving it is Prize worthy in the first place.

I actually have an explanation, but the question is whether or not that explanation is going to be convincing to the traditional camp. In summary, medicine is an extremely traditional discipline: an unspoken principle is 'don't fix what ain't broken'. If one doesn't conform to the traditions of medicine, one is quickly ostracized and cast out; this almost instantly applies once one suggests going beyond the traditional boundaries. If one has to go against the foundational traditions of the medical establishment to prove their point - even if one can demonstrate that what they are doing is in fact correct - this is simply not a path that many people are willing to take.

Notice the striking resemblance between this issue and the arguments regarding the problems in the foundations of QM, which is also split into two camps: those who take the issues seriously as unjustifiable loose ends in physics - i.e. foundationalists - and those arguing that those problems aren't actually real problems and can just be straightforwardly ignored for whatever instrumental or practical reasons, such as personal convenience - i.e. pragmatists.

Auto-Didact · Dec 14, 2019

Dr. Courtney said:

Deciding when to stop data collection is an important part of an experimental design to prevent the introduction of bias. My preference is to design experiments from the outset that stop either with a fixed, pre-determined number of data points, or run for a fixed, pre-determined duration of time. It is hard to introduce a human decision to stop data collection once it has begun that is free of bias, especially if the human decision maker(s) are aware of the results so far.

This sounds like the conventional methodology to decide necessary sample sizes a priori based on power analysis used in standard statistical clinical research.

On the other hand, in the practice of clinical medicine among experienced practitioners we have a non-explanatory term for limiting data collection only to the bare minimum necessary in order to make a clinical decision: correct practice. To contrast, collecting data which cannot directly be considered to be relevant for the problem at hand is seen as 'incorrect practice'.

Engaging in incorrect practice too frequently, either deliberately or by mistake, is a punishable offense; I reckon implementing something like this would be effective as well to deter such behavior in scientific practice.

Stephen Tashi · Dec 14, 2019

PeterDonis said:

Suppose we are trying to determine whether there is a bias towards boys, i.e., whether the probability p of having a boy is greater than 1/2. Given the information above, is the data from couple #2 stronger evidence in favor of such a bias than the (identical) data from couple #1?

To get a mathematical answer, we would have to define what "evidence" for p > 1/2 means and what procedure will used to determine that evidence_A is stonger than evidence_B.

In frequentist statistics, the common language notion of "strength of evidence" suggests comparing "power curves" for statistical tests. To do that, you must pick a particular statistics and define the rejection region for each test. (The number of boys in the data is but one example of a statistic that can defined as a function of the data.)

In Bayesian statistics, one can compute the probability that p > 1/2 given a prior distribution for p and the data. Suppose the two experiments A and B produce respective data sets ##D_A## and ##D_B##. For particular data sets, it might turn out that ##Pr(p>1/2 | D_A) > Pr(p> 1/2| D_B)##. However, for different particular data sets, the inequality might be reversed. So how shall we phrase your question in order to consider in general whether experiment A or experiment B provides more evidence?

I suppose one way is consider the expected value for ##Pr(p > 1/2 | D)## where the expectation is taken over the joint distribution of possible data sets and values of ##p## - do this for each experiment and compare answers. This is a suspicious procedure from the viewpoint of experimental design. It seems to be asking "Which experiment should I pick to give the strongest evidence that p > 1/2?". However, that seems to be the content of your question.

From the point of view of experimental design, a nobler question is "Which experiment gives a better estimate of p?". To translate that into mathematics requires defining what estimators will be used.

Auto-Didact · Dec 14, 2019

PeroK said:

Okay, I'm willing to believe this. But, I would like to see some evidence.

I can see the potential for the Bayesian approach. What I don't see is how the standard approach can ultimately fail in general.

Why has everyone (who uses standard statistical analysis) been wrong all along how many people know this?

Coincidentally, Sabine Hossenfelder just uploaded a video which gives a (simplified) explanation of an aspect of this same topic, which applies to all the sciences more broadly instead of just w.r.t. how statistical methodology is used by scientists in medicine:

An important general lesson to take away from the video is that biases which have not been quantified - perhaps simply because the type of bias was discovered after statistical methodology - are often ignored by scientists; this also weakens the efficacy of statistical analysis, regardless of how careful the scientists were.

PeterDonis · Dec 14, 2019

PeroK said:

The vast majority of medical research has used standard statistical analysis, which is based on frequentist methods.

Yes, and much of that medical research fails to be replicated. The "replication crisis" that was making headlines some time back was not limited to medical research, but it included medical research. One of the key criticisms of research that failed to be replicated, on investigation, was inappropriate use of p-values. That criticism was basically saying the same thing that @Dale and I are saying in this thread: the p-value is the answer to a different question than the question you actually want the answer to.

PeroK said:

standard statistical methods being inadequate simply because a test can yield more false positives than true positives

My point was that the p-value, which is the standard statistical method for hypothesis testing, can't answer this question for you. The p-value tells you the probability that the positive test result would have happened by chance, if you don't have the disease. But the probability you are interested in is the probability that you have the disease, given the positive test result. It's easy to find actual tests and actual rare conditions where the p-value after a positive test result can be well below the 5% "significance" threshold, which under standard statistical methods means you reject the null hypothesis (i.e., you tell the patient they most likely have the disease), but the actual chance that the patient has the disease given a positive test result is small.

PeterDonis · Dec 14, 2019

@Stephen Tashi Given all that you said in post #129, what is your answer to the question posed in the OP?

Dale · Dec 14, 2019

PeroK said:

Peter, you are fairly harsh in the physics forums when nonsense is posted, so there is no reason not to point out that this is nonsense.

Actually, what he described is pretty standard introductory material for Bayesian probability.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4585185/

Auto-Didact · Dec 14, 2019

PeterDonis said:

That criticism was basically saying the same thing that @Dale and I are saying in this thread: the p-value is the answer to a different question than the question you actually want the answer to.

This, as well as basically the entire thread, reminds me of a quote by Cantor:
To ask the right question is harder than to answer it.

This essentially is why science in general (and physics in particular) is difficult; i.e. not because solving technical (mathematical) questions can be somewhat difficult, but instead because the right question has to be identified and then asked first. This means that in any open-ended scientific inquiry one should postpone naively mathematicizing what can easily be mathematicized if it isn't clear what is essential, i.e. prematurely mathematicizing a conceptual issue into a technical issue is a waste of time which should be avoided!

It took me quite along while to learn this lesson because it goes against both my instincts as well as my training. Moreover, the realization that this lesson is actually useful is a reoccuring theme when doing applied mathematics in the service of some science, which only comes when one e.g. repeatedly tries to generalize from some particular idealization towards a more realistic description, which then generally turns out to be literally unreachable in any obvious way.

Dale · Dec 14, 2019

Stephen Tashi said:

To get a mathematical answer, we would have to define what "evidence" for p > 1/2 means and what procedure will used to determine that evidence_A is stonger than evidence_B.

In Bayesian statistics this is well defined and straightforward.

https://en.m.wikipedia.org/wiki/Bayes_factor

Of course, there are limitations to any technique

Dale · Dec 14, 2019

Auto-Didact said:

medicine is an extremely traditional discipline: an unspoken principle is 'don't fix what ain't broken'

I think there is a growing recognition of the parts of medical science that are broken. I am optimistic in the long term and even in the short term the changes are at least interesting.

PeroK · Dec 15, 2019

Dale said:

Actually, what he described is pretty standard introductory material for Bayesian probability.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4585185/

@PeterDonis I apologise as I spoke too harshly. I really don't want to get involved in a debate on medical statistics and how they are used. I didn't realize that was what was at the root of all this.

That article seems to me more about the politics of communicating with patients than actual statistic methods themselves.

If you are all telling me that traditional statistical methods are widely misunderstood and misused in medical science, then I have no grounds to challenge that.

Dale · Dec 15, 2019

PeroK said:

That article seems to me more about the politics of communicating with patients than actual statistic methods themselves.

Yes, the communication with patients is particularly important since they cannot be expected to understand the statistical issues themselves. The article did talk about the fact that for rare diseases the likelihood of having the disease after receiving a positive test result is low. I.e. for rare diseases most positives are false positives.

PeroK · Dec 15, 2019

Dale said:

Yes, the communication with patients is particularly important since they cannot be expected to understand the statistical issues themselves. The article did talk about the fact that for rare diseases the likelihood of having the disease after receiving a positive test result is low. I.e. for rare diseases most positives are false positives.

Yes, but it doesn't take Bayesian methods to come to that conclusion.

atyy · Dec 15, 2019

PeterDonis said:

The results of medical tests for rare conditions are usually much better analyzed using Bayesian methods, yes, because those methods correctly take into account the rarity of the underlying condition, in relation to the accuracy of the test. Roughly speaking, if the condition you are testing for is rarer than a false positive on the test, any given positive result on the test is more likely to be a false positive than a true one. Frequentist methods don't give you the right tools for evaluating this.

As @PeroK has pointed out, this is wrong. You are getting Bayes's rule confused with Bayesian methods. Bayes's rule is part of both Frequentist and Bayesian methods. Frequentist methods and Bayes's rule are perfectly fine for analyzing rare conditions.

Does the statistical weight of data depend on the generating process?

Similar threads

Hot Threads

Recent Insights