What motivates Bayes' Theorem?

Agent Smith · Oct 6, 2024

As far as I know, Bayes' theorem is ##P(A|B) = \frac{P(A) \times P(B|A)}{P(A) \times P(B|A) + P(\neg A) \times P(B|\neg A)}##.

I recall someone saying Bayes' theorem revolutionized probability. Bayes himself and Laplace are supposedly key figures in this revolution. I know how to apply the theorem, but in what sense is the theorem a "revolution"?

jedishrfu · Oct 6, 2024

Bayes theorem allows for probabilities to be updated based on new evidence making it a better way to estimate probabilities.

https://en.wikipedia.org/wiki/Bayes'_theorem

PeroK · Oct 6, 2024

jedishrfu said:

Bayes theorem allows for probabilities to be updated based on new evidence making it a better way to estimate probabilities.

https://en.wikipedia.org/wiki/Bayes'_theorem

You may be confusing Bayesian Statistics, which is what you describe, and Bayes' Theorem, which is the essential and elementary component of probability theory quoted in the OP. The former is the Bayesian Interpretation, as given in that Wikipedia page.

Perhaps Bayes' Theorem was revolutionary, in that it allowed probabilities to be calculated in reverse. Bayesian statistics allows probabilities to be calculated in a wider range of circumstances. Most of the time, Bayesian and frequentist calculations agree.

That said, a frequentist obviously updates probabilities based on new evidence. What the Bayesian can do is start with very limited evidence or no evidence at all! The frequentist can't do that.

PeroK · Oct 6, 2024

Agent Smith said:

TL;DR Summary: What motivates Bayes' Theorem

As far as I know, Bayes' theorem is ##P(A|B) = \frac{P(A) \times P(B|A)}{P(A) \times P(B|A) + P(\neg A) \times P(B|\neg A)}##.

I've never seen it like that before. The denominator on the right is just an expansion of ##P(B)##. The simplest form of Bayes' Theorem, IMO, is:
$$P(B)P(A|B) = P(A \cap B) = P(A)P(B|A)$$You can draw a Venn diagram to see this.

vela · Oct 6, 2024

PeroK said:

What the Bayesian can do is start with very limited evidence or no evidence at all! The frequentist can't do that.

Which one are you, a Bayesian or frequentist?

PeroK · Oct 6, 2024

vela said:

Which one are you, a Bayesian or frequentist?

I'm a frequentist myself. I'm not convinced that the probabilities that can be calculated from the Bayesian Interpretation are, in all circumstances, meaningful.

PeroK · Oct 6, 2024

For example, using Bayesian methods to calculate the probability that the universe is flat (or whatever). Yes, you get a number and can call it a probability, but what does it mean? As a frequentist, you have to be honest and say that there is no answer (from frequentist probability theory). And sometimes "no answer" is better and more honest than an answer that may or may not mean anything.

pines-demon · Oct 6, 2024

Bayes theorem has been extrapolated to create the Bayesian interpretation of probability which has been extrapolated to a whole Bayesian epistemology. This has had some influence in physics with some ideas of Bayesian thermodynamics and Bayesian interpretations of quantum mechanics, but none of these are very popular.

haushofer · Oct 6, 2024

Bayes' formula allows you to convert P(A|B) into P(B|A). E.g., in tossing a coin you can calculate the probability of an outcome given a (null)hypothesis. But often you want to know the reverse: what's e.g. the probability the coin is not fair given the outcome?

Or consider a suspect. You can calculate with some model the probability he was at the crime scene given he's innocent. But a prosecutor wants to know the reverse probability. Confusing these is known as the prosecutor's fallacy.

haushofer · Oct 6, 2024

PeroK said:

For example, using Bayesian methods to calculate the probability that the universe is flat (or whatever). Yes, you get a number and can call it a probability, but what does it mean? As a frequentist, you have to be honest and say that there is no answer (from frequentist probability theory). And sometimes "no answer" is better and more honest than an answer that may or may not mean anything.

It means how much confidence you have given certain background information. Subjectivity doesn't deprive things of meaning.

PeroK · Oct 6, 2024

haushofer said:

It means how much confidence you have given certain background information. Subjectivity doesn't deprive things of meaning.

One could argue that science must be objective.

pines-demon · Oct 6, 2024

haushofer said:

Bayes' formula allows you to convert P(A|B) into P(B|A). E.g., in tossing a coin you can calculate the probability of an outcome given a (null)hypothesis. But often you want to know the reverse: what's e.g. the probability the coin is not fair given the outcome?

Or consider a suspect. You can calculate with some model the probability he was at the crime scene given he's innocent. But a prosecutor wants to know the reverse probability. Confusing these is known as the prosecutor's fallacy.

In mathematics one has to distinguish between probabilities and likelihood.

Agent Smith · Oct 6, 2024

PeroK said:

I've never seen it like that before. The denominator on the right is just an expansion of ##P(B)##. The simplest form of Bayes' Theorem, IMO, is:
$$P(B)P(A|B) = P(A \cap B) = P(A)P(B|A)$$You can draw a Venn diagram to see this.

Si, I don't know why, if it all, one form is preferred over the other. The derivation is also difficult to follow.

Agent Smith · Oct 6, 2024

haushofer said:

Bayes' formula allows you to convert P(A|B) into P(B|A). E.g., in tossing a coin you can calculate the probability of an outcome given a (null)hypothesis. But often you want to know the reverse: what's e.g. the probability the coin is not fair given the outcome?

Or consider a suspect. You can calculate with some model the probability he was at the crime scene given he's innocent. But a prosecutor wants to know the reverse probability. Confusing these is known as the prosecutor's fallacy.

Is that it? We can compute "reverse probabilities"?

PeroK · Oct 6, 2024

Agent Smith said:

Si, I don't know why, if it all, one form is preferred over the other.

I prefer simplicity and memorability.

Agent Smith said:

The derivation is also difficult to follow.

What derivation?

Agent Smith · Oct 6, 2024

jedishrfu said:

Bayes theorem allows for probabilities to be updated based on new evidence making it a better way to estimate probabilities.

https://en.wikipedia.org/wiki/Bayes'_theorem

This "new evidence" is a sample, but we can use this even in standard (frequentist) statistics.

Agent Smith · Oct 6, 2024

PeroK said:

I prefer simplicity and memorability.

What derivation?

The derivation from your simpler formula to the more complex one I posted. I drew a Venn Diagram and it meshes with what I know. Gracias

PeroK · Oct 6, 2024

Agent Smith said:

The derivation from your simpler formula to the more complex one I posted. I drew a Venn Diagram and it meshes with what I know. Gracias

If ##A_1, \dots A_n## are mutually exclusive events that exhaust the sample space, then for any event ##B##:
$$P(B) = P(A_1)P(B|A_1) + \dots +P(A_n)P(B|A_n)$$A special case of this is where the events are ##A## and ##\neg A##, in which case:
$$P(B) = P(A)P(B|A) + P(\neg A)P(B|\neg A)$$That's a general expansion of ##P(B)##, so there is no need to include that in Bayes' theorem. The simple ##P(B)## should do just as well.

Dale · Oct 6, 2024

Agent Smith said:

TL;DR Summary: What motivates Bayes' Theorem

in what sense is the theorem a "revolution"?

The main thing is that it allows you to reverse conditional probabilities. So if you know the unconditional probabilities and also the conditional probability ##P(A|B)## then you can calculate the reversed conditional probability ##P(B|A)##.

This is important in science. Usually my theory tells me ##P(observation|theory)##. But then I can use Bayes theorem to determine ##P(theory|observation)##. So I can use experiment to test my theory.

Agent Smith · Oct 6, 2024

Is there anything I should note other than that probabilities can be reversed with Bayes' theorem? The Wikipedia page says that Bayes' "real intention" was to "prove the existence of God."

For ##P(A|B) = \frac{P(A) \times P(B|A)}{P(B)}##, what happens when ##P(B) = 0##?
Gracias. The Wikipedia article on Bayes' theorem mentions the caveat that ##P(B) \ne 0##.

Vanadium 50 · Oct 6, 2024

Agent Smith said:

TL;DR Summary: What motivates Bayes' Theorem

I recall someone saying

Reference please? Otherwise we are chasing ghosts.

Agent Smith · Oct 6, 2024

@Vanadium 50 , Wikipedia?

Vanadium 50 · Oct 6, 2024

So we're going to play that game, eh? "It's somewhere in those zillion pages."

Count me out.

Agent Smith · Oct 6, 2024

Vanadium 50 said:

So we're going to play that game, eh? "It's somewhere in those zillion pages."

Count me out.

Wikipedia has a good article on Bayes' theorem. Most of what I wrote here cometh from there.

Agent Smith · Oct 7, 2024

PeroK said:

What the Bayesian can do is start with very limited evidence or no evidence at all! The frequentist can't do that.

A frequentist takes a sample (the least). A Bayesian ___?

Agent Smith · Oct 7, 2024

PeroK said:

For example, using Bayesian methods to calculate the probability that the universe is flat (or whatever). Yes, you get a number and can call it a probability, but what does it mean? As a frequentist, you have to be honest and say that there is no answer (from frequentist probability theory). And sometimes "no answer" is better and more honest than an answer that may or may not mean anything.

Throwing darts here, but does the process work like this:
Fix a significance level ##\alpha = 0.05##
##H_0## = The universe is curved i.e. ##\mu_0 = v## and ##\sigma_0 = u## where ##v## is a specific value. Here ##\mu_0## is the mean of some physical measurement and ##\sigma_0## its standard deviation
##H_a## = The universe is flat i.e. ##\mu_0 > v##

We then proceed to make some measurements (sample, of size ##n##). Compute the mean from the sample ##\mu_s = x##. We find the ##\text{z score} = \frac{\mu_s - \mu_0}{\frac{\sigma}{\sqrt n}}##. We can read off our ##\text{P-value}## from a z-table. If ##\text{P-value} < \alpha## we can reject ##H_0## in favor of ##H_a## and conclude that the universe is flat.

?

PeroK · Oct 7, 2024

Agent Smith said:

A frequentist takes a sample (the least). A Bayesian ___?

You can read about it. The difference is more about what fundamentally is a probability. At this stage you should focus on probability theory and your course material.

It's unfortunate that many people associate Bayes Theorem with Bayesian statistics. This leads to these esoteric debates, which are of limited value at this stage. Bayes Theorem is elementary and fundamental and shouldn't lead to a debate about Bayesian statistics.

PeroK · Oct 7, 2024

Agent Smith said:

Throwing darts here, but does the process work like this:
Fix a significance level ##\alpha = 0.05##
##H_0## = The universe is curved i.e. ##\mu_0 = v## and ##\sigma_0 = u## where ##v## is a specific value. Here ##\mu_0## is the mean of some physical measurement and ##\sigma_0## its standard deviation
##H_a## = The universe is flat i.e. ##\mu_0 > v##

We then proceed to make some measurements (sample, of size ##n##). Compute the mean from the sample ##\mu_s = x##. We find the ##\text{z score} = \frac{\mu_s - \mu_0}{\frac{\sigma}{\sqrt n}}##. We can read off our ##\text{P-value}## from a z-table. If ##\text{P-value} < \alpha## we can reject ##H_0## in favor of ##H_a## and conclude that the universe is flat.

?

There is only one universe, so the sample size is 1.

Agent Smith · Oct 7, 2024

PeroK said:

There is only one universe, so the sample size is 1.

Yes that's correct, I forgot. What about the measurements I "used" to test the hypotheses that the universe is flat?

PeroK · Oct 7, 2024

Agent Smith said:

Yes that's correct, I forgot. What about the measurements I "used" to test the hypotheses that the universe is flat?

The universe is either flat or it isn't. Probability doesn't apply. One could say that the distribution is everything and in this case there is either no distribution or the distribution is entirely unknown.

Dale · Oct 7, 2024

Agent Smith said:

A frequentist takes a sample (the least). A Bayesian ___?

Sometimes there is no sample. So a Bayesian can still choose a prior that represents their uncertainty.

Dale · Oct 7, 2024

PeroK said:

There is only one universe, so the sample size is 1.

PeroK said:

The universe is either flat or it isn't. Probability doesn't apply.

This isn’t correct. Long term frequency doesn’t apply, but frequency isn’t probability.

What defines probability is the Kolomgorov axioms. Anything that fulfills those axioms is probability. Frequency is one example, and uncertainty is another.

There is no population of universes to draw a large sample from and form frequencies, but there is uncertainty about the universe’s flatness. So probability does apply.

PeroK · Oct 7, 2024

Dale said:

This isn’t correct. Long term frequency doesn’t apply, but frequency isn’t probability.

What defines probability is the Kolomgorov axioms. Anything that fulfills those axioms is probability. Frequency is one example, and uncertainty is another.

There is no population of universes to draw a large sample from and form frequencies, but there is uncertainty about the universe’s flatness. So probability does apply.

The Kolmogorov axioms are a pure mathematical construction. Whether you can apply them to a given physical scenario is the question. The universe cannot, by itself, satisfy mathematical axioms. The universe is not a measure space.

If you conclude, for example, that the probability that the universe is flat is 90%, then (as a frequentist) I don't know what to make of that. It's a number and it may have some meaning, but I cannot relate it to what I understand as a probability.

Uncertainty, by itself, does not imply probabilities. One example is the Two-Envelope Problem:

https://en.wikipedia.org/wiki/Two_envelopes_problem

If you assume that there must be probabilities, then you end up with contradictory calculations and inconsistencies. Until, you realise that there must be a distribution. Without specifying the distribution, the numbers you calculate are not probabilities. To have probabilities relating to the universe, you must specify the distribution which applies to the universe(s). Without it, the numbers you calculate are not probabilities.

Dale · Oct 7, 2024

PeroK said:

The Kolmogorov axioms are a pure mathematical construction.

Yes. But they are the mathematical construction that defines probability.

PeroK said:

The universe cannot, by itself, satisfy mathematical axioms

Nobody claimed or even implied otherwise.

PeroK said:

If you conclude, for example, that the probability that the universe is flat is 90%, then (as a frequentist) I don't know what to make of that. It's a number and it may have some meaning, but I cannot relate it to what I understand as a probability.

Which is precisely why I mention the axioms. A lot of frequentists make the same mistake, but it is a mistake.

A lot of frequentists mistakenly believe that probability is defined by long-run frequencies, but that is not correct. Probability is defined by the axioms. Long run frequencies are an example of probability because long run frequencies satisfy the axioms.

This is similar to vectors. Many students are introduced to vectors as little arrows with a magnitude and direction. Later, they are surprised to learn that polynomials are also vectors. We don’t think of polynomials as arrows, but they satisfy the axioms of vectors. Little arrows are the easiest example of vectors, but not the only example. Similarly frequencies are the easiest example of probability, but not the only example.

So a statement that the probability that the universe is flat is 90% cannot reasonably be understood as a frequency, but it certainly can be understood as an uncertainty. And since uncertainties are also probabilities, it is a valid statement.

PeroK · Oct 7, 2024

Dale said:

Yes. But they are the mathematical construction that defines probability.

Nobody claimed or even implied otherwise.

Which is precisely why I mention the axioms. A lot of frequentists make the same mistake, but it is a mistake.

A lot of frequentists mistakenly believe that probability is defined by long-run frequencies, but that is not correct. Probability is defined by the axioms. Long run frequencies are an example of probability because long run frequencies satisfy the axioms.

This is similar to vectors. Many students are introduced to vectors as little arrows with a magnitude and direction. Later, they are surprised to learn that polynomials are also vectors. We don’t think of polynomials as arrows, but they satisfy the axioms of vectors. Little arrows are the easiest example of vectors, but not the only example. Similarly frequencies are the easiest example of probability, but not the only example.

So a statement that the probability that the universe is flat is 90% cannot reasonably be understood as a frequency, but it certainly can be understood as an uncertainty. And since uncertainties are also probabilities, it is a valid statement.

That, as I understand it, is the Bayesian interpretation. That the frequentist interpretation is a mistake is pushing your luck. At the very least, there is a Bayesian prior the the frequentist interpretation is not a mistake with nonzero probability!

What motivates Bayes' Theorem?

Similar threads

Hot Threads

Recent Insights