Bayesian statistics in science

A. Neumaier · Nov 5, 2021

PeterDonis said:

@A. Neumaier, we clearly have very, very different readings of Jaynes

Yes, indeed. But I have given detailed arguments for my reading.

PeterDonis said:

and this subthread is going well off topic.

We could continue the discussion in a subthread if you split the present one. Putting posts 56 and later into the new subthread would be a good way to split.

Fra · Nov 5, 2021

A. Neumaier said:

No. Democracy leads to permanent error if the majority is in error.

Rather, the scientific community is a meritocracy of scientists. The best scientists have in the long run the most influence.

Observer democracy is the metafor I use to label the mechanisms behind emergent objectivity. It does not mean litterality "voting" etc, nor an arithmetic average process. It is more to be understood as survival of the fittest, just they way you mention. The process of the negotiating a common consensus is the democratic process.

But the main point of the "subjective inference" side is one that I think sets us apart:

How does an agent construct the instrinci measure implict in your notion of "error", without relying on the feedback from the environment (ie other agents/scientist)? The core motivatior for the subjective inference is that there exist not absolut truth. THIS is what is "desiderata", an idealisation of the scientific process that IMO does not quite correspond to what actually happens. Presuming the existence of an external measure or judge, IMO, misguides us in trying to solve open problems. (That's not the say the emergent objectivity will not satisfy your wish for FAPP). It's a matter of constructing principle. This is a key between objective and subjective inference view. In this sense, objective approach gives priority to the RESULT as constraining the process, while the subjective approach gives priority to the PROCESS over prejudies over which RESULTS we will find.

I have no illusion that we will change in others minds here, but I think this illustrates the difference in views.

/Fedrik

Sunil · Nov 5, 2021

A. Neumaier said:

But reality is so complex that many things require qualitative judgment - something that cannot be formalized since (unlike probability, where there is a fair consensus about the basic rules to apply) there is no agreement among humans about how to judge.

Complexity is not the point here. The rules of probability theory can be applied also in arbitrary complex situations.

A. Neumaier said:

They differ in the results whenever they differ in the prior. The prior is by definition a probability distribution hence subjective = robot-specific (in Jaynes' scenarios), a state of the mind of the robot. No two robots will have the same state of the mind unless they are clones of each other, in every detail that might affect the course of their computations.

The prior is the state without information. It is the state with maximal entropy. This is unproblematic and well-defined. Entropy is a well-defined function on the space of probability distributions, not? To compute that probability distribution for sufficiently complex configuration spaces may be problematic.

There are some situations where one can reasonably use unbounded priors, which are not probabilities but contain less information than any probability distribution, like ##dx## on the straight line. But in these cases one can also use, instead, probabilities smeared out over extremely large regions, and the size of this region will not make a difference.

A. Neumaier said:

Jaynes does not succeed in this. There is a notion of noninformative prior for certain classes of estimation problems, but this gives a good prior only if (from the point of view of a frequentist) it resembles the true probability distribution. The reason is that no probability distribution is truly noninformative, so if you don't have past data (or extrapolate from similar experiences) whatever prior you pick is pure prejudice or hope.

Why should Bayesians care about what frequentists think? You maximize non-informativeness by maximizing entropy.

gentzen · Nov 5, 2021

PeterDonis said:

Remember that my original post in this subthread, post #33, was to object to a claim (made by @gentzen, not you) that your prescription is "opposite" to what Jaynes would say. My point was simply that, in this particular case, Jaynes would say exactly what you are saying.

I feared that it was me who drove you into that discussion, and that I should better also write something, despite the excellent explanations of A. Neumaier. By "opposite" I meant that Jaynes pushes a Bayesian interpretation of probability, but that A. Neumaier's thermal interpretation contains ideas that I see as a rehabilitation of the frequentist interpretation:

gentzen said:

The work of A. Neumaier contains important ideas and elaborations how to overcome critical issues of the frequentist interpretation. One of those critical issues is that it doesn't apply to single systems, but only to ensembles. And part of the solution is to be realistic about the precision of magnitudes (including probabilities) appropriate for the concrete situation you want to talk about.

gentzen said:

..., and tries to avoid one circularity that could arise in a (virtual) frequentist interpretation of probability (with respect to uncertainties). Let me quote from section "4. A view from the past" from an article about earthquake prediction which I have reread (yesterday and) today:

We will suppose (as we may by lumping several primitive propositions together) that there is just one primitive proposition, the ‘probability axiom,’ and we will call it A for short. ...
Now an A cannot assert a certainty about a particular number n of throws, such as ‘the proportion of 6’s will certainly be within p ± ϵ for large enough n (the largeness depending on ϵ)’. It can only say ‘the proportion will lie between p ± ϵ with at least such and such probability (depending on ϵ and n₀ ) whenever n > n₀ ’. The vicious circle is apparent.

PeterDonis said:

Even the "subjective" element in probabilities--that different "robots" might have different information--is removed in your example. So what you are describing is in fact perfectly consistent with the general method Jaynes describes. It's just a sort of degenerate case of it, ...

A weakness of the Bayesian interpretation is that it has no criteria for the situations where probabilities fail to be "objective". The earthquake prediction example above is such a case. Regarding situations where the frequentist interpretation applies and gives objective meaning to probabilities as a special case of Bayesian reasoning misses the point of the frequentist interpretation.

PeterDonis said:

An unfortunate thing about the book is that it was not finished when Jaynes died. I suspect that if he had lived long enough to finish it, it would be tighter and more like his papers than it is.

PeterDonis said:

I note, btw, that the paper you reference (the one titled "Prior Probabilities") has as its explicit purpose to remove "arbitrariness" in assigning prior probabilities.

Another explanation is that the paper "Prior Probabilities" argues for an objective Bayesian interpretation. The first two chapter of the book on the other hand argue for a Bayesian interpretation without yet distinguishing between subjective and objective Bayesian.

Sunil said:

Jaynes is not about "subjective mind of a knower". This sounds like you don't understand the difference between de Finetti's subjective Bayesian interpretation and Jaynes' objective Bayesian interpretation. It is an essential one. Jaynes is about what is rational to conclude given some incomplete information. ... In subjective probability you are free to start with whatever you think is fine. ... What is fixed is only the updating. So, computing priors - the probability if there is no information - is meaningless in subjective probability, but is a key in objective probability.

This comment made me realize that there might be a deeper reason why I liked Jaynes papers, but had trouble with his book (at least with the first few chapters, where the robot appears), see above.

The trouble with the objective Bayesian interpretation is that it only works in special cases. There exists some knowledge when it works and when it breaks down. I even assumed that A. Neumaier knew more details about this than me, but I am no longer sure after reading his reply: "There is a notion of noninformative prior for certain classes of estimation problems, ..." On the other hand, maybe his reply was basically identical to my "And it is a difficult to understand technical issue." except that he tried to explain it nevertheless:

gentzen said:

In the end, this is the place where I see the non-intuitiveness emerge again in his interpretation. In a certain sense, I believe that he knows this, ... But in a certain sense, it would really be unfair if this would be held against his interpretation, because the issue was there all the time for the Bayesian interpretation, and they coped with it by simply staying silent. And it is a difficult to understand technical issue. To get some feeling for how technical, see for example:
Consistency and strong inconsistency of group-invariant predictive inferences (1999) by Morris L. Eaton and William D. Sudderth
Dutch book against some `objective' priors (2004) by Morris L. Eaton and David A. Freedman

Fra · Nov 5, 2021

gentzen said:

A weakness of the Bayesian interpretation is that it has no criteria for the situations where probabilities fail to be "objective".

Maybe I totally missed your point(?) but how about that objective bayesian view makes sense if and when there exists a non-lossy transformation between the state spaces of different agents, so that one can claim than any agent is equivalent ot another one.

The divider here, is that the objective approch presumes this, and tries to use it as a constraint.

The subjective approach does not presume this, one instead see this transformations which makes symmetries manifest as occasionally emergent. An one can expect this emergence would be spontaneous as agents are allowed to interaction, as a kind steady state of self organistaion.

There is a simple reason I think why subjective views are annoying - they are harder to capture with a fixed math, as the state space is not fixed. If one insistes on this, one is forced to make the state space unreaonsably big from start; which gives a fine tuning problem, where the agent may diverge instead of learn.

This is how I see it.

/Fredrik

PeterDonis · Nov 5, 2021

A. Neumaier said:

We could continue the discussion in a subthread if you split the present one.

I could do that but I'm not sure which forum to put it in. The subthread in question isn't really about quantum interpretations, or indeed about quantum mechanics specifically at all. Perhaps the probability and statistics forum over in the math section would be best.

A. Neumaier · Nov 5, 2021

PeterDonis said:

I could do that but I'm not sure which forum to put it in. The subthread in question isn't really about quantum interpretations, or indeed about quantum mechanics specifically at all. Perhaps the probability and statistics forum over in the math section would be best.

In my opinion no mathematics is involved. It is more about the philosophy of science, which would fit here.

PeterDonis · Nov 5, 2021

A. Neumaier said:

In my opinion no mathematics is involved. It is more about the philosophy of science, which would fit here.

Fair enough. I'll look at splitting it off into a separate thread in this subforum.

PeterDonis · Nov 5, 2021

PeterDonis said:

Fair enough. I'll look at splitting it off into a separate thread in this subforum.

Split has been done, this post is now in the new thread.

PeterDonis · Nov 5, 2021

A. Neumaier said:

I have given detailed arguments for my reading.

You have given your interpretations of what you have quoted from Jaynes. But your interpretation of the things you quote is very different from mine. That is why I said we have very different readings. As one example, I have already explained how we appear to be taking the word "subjective" as Jaynes uses it to mean very different things.

A. Neumaier said:

reality is so complex that many things require qualitative judgment

The particular situation your paper in the original thread describes is not such a situation. No qualitative judgments of the sort that are required in domains that are not hard sciences, are required to estimate density matrix parameters from quantum tomography data. The only judgment that seems to me to be required is about what approximation to use given that exact calculations are computationally intractable. See further comments below.

A. Neumaier said:

the cogent reason is that he uses different software and/or weighted the data differently because of this or that judgment, but gets a result consistent with the accuracies to be expected.

In other words, they are using different approximations to a computationally intractable exact solution.

A. Neumaier said:

there are reasons why someone may choose a different Hilbert space than I did in your hypothetical setting: For efficient quantum tomography you need to truncate an infinite-dimensional Hilbert space by a subspace of very low dimensions, and picking this subspace is a matter of judgment and can be done in multiple defensible ways.

I'm not sure I understand. For any system on which we could do quantum tomography, won't there be one unique finite-dimensional Hilbert space? For example, if I have two qubits, possibly entangled (and I want to use quantum tomography to determine whether they are entangled), isn't the Hilbert space just ##\mathbb{C}^2 \times \mathbb{C}^2##?

gentzen · Nov 8, 2021

Fra said:

Maybe I totally missed your point(?) but how about that objective bayesian view makes sense if and when there exists a non-lossy transformation between the state spaces of different agents, so that one can claim than any agent is equivalent ot another one.

The divider here, is that the objective approch presumes this, and tries to use it as a constraint.

I didn't try to elaborate my point. Maybe a good way to elaborate the point "in my style" would be to point out that the "interpretation of probability in terms of observed frequencies" gets defended despite the "hints" that "mathematically it can only remain an intuitive notion":

Stephen Tashi said:

As to an interpretation of probability in terms of observed frequencies, mathematically it can only remain an intuitive notion. The attempt to use probability to say something definite about an observed frequency is self-contradictory except in the trivial case where you assign a particular frequency a probability of 1, or of zero. For example, it would be satisfying to say "In 100 tosses of a fair coin, at least 3 tosses will be heads". That type of statement is an absolute guaranteed connection between a probabilty and an observed frequency. However, the theorems of probability theory provide no such guaranteed connections. The theorems of probability tell us about the probability of frequencies. The best we can get in absolute guarantees are theorems with conclusions like ##lim_{n\to\infty}Pr(E(n))=1##. Then we must interpret what such a limit means. Poetically, we can say "At infinity the event is guaranteed to happen". But such a verbal interpretation is mathematically imprecise and, in applications, the concept of an event "at infinity" may or may not make sense.

As a question in physics, we can ask whether there exists a property of situations called probability that is independent of different observers - to the extent that if different people perform the same experiment to test a situation, they (probably) will get (approximately) the same estimate for the probability in question if they collect enough data. If we take the view that we live in a universe where scientists have at least average luck, we can replace the qualifying adjective "probably" with "certainly" and if we idealize "enough data" to be"an infinite amount of data", we can change "approximately" to "exactly". Such thinking is permitted in physics. I think the concept is called "physical probability".

This is the background against which I see the rehabilitation of the frequentist interpretation.

So I don't want to dismiss attempts at an objective Bayesian interpretation, but I would say that they don't even have the goal to rehabilitate the frequentist interpretation.

Fra · Nov 8, 2021

gentzen said:

I didn't try to elaborate my point. Maybe a good way to elaborate the point "in my style" would be to point out that the "interpretation of probability in terms of observed frequencies" gets defended despite the "hints" that "mathematically it can only remain an intuitive notion":This is the background against which I see the rehabilitation of the frequentist interpretation.

So I don't want to dismiss attempts at an objective Bayesian interpretation, but I would say that they don't even have the goal to rehabilitate the frequentist interpretation.

I likely misread your point. Still not sure what you mean. The post you refer to also seems to contain the tension between descriptive vs guiding probability, but there may be more issues.

Anyway, for the record, I do not subscribe to the objective bayesian view. If your point is "critique" against the objective bayesians, then I agree.

From my perspective, I think the notion of frequentist interpretation makes sense only in the sense of descriptive probability, it's roughly the same thing. As I see it, from the agent view, there is no conflict here. The descriptive frequentist view, refers to the "input"(evidence), and the guiding bayesian probability refers to a "tendency" or "odds" that determines the agents dice, the results from an inference. The output of that inference obviously do not have a proper frequentist interpretation. But I also do not se the "problem" with that details.

The descriptive probability, is naturally frequentist if you think in terms of counted events. But the guiding probability represents the "dice" that the agent uses for his random walk; it does not related to "integer counts" of anything.

/Fredrik

WernerQH · Nov 8, 2021

I find the discussion whether probabilities are objective or subjective rather pointless. By necessity they are both, because probabilities are the glue that connects our theories to the real world.

Saying that the velocities of the molecules in a gas are subject to a Maxwellian distribution is surely a probabilistic statement. If physics is not to turn into a meaningless game, the temperature of a gas and probabilities must have objective meaning. On the other hand I concur with de Finetti that probabilities cannot be physical. They are part of our description, not of reality itself. And a physical description is always (to varying degrees) tentative (to use a better term than "subjective"). A velocity distribution can turn out to be non-Maxwellian. It is the departures from thermal equilibrium (from the blackbody spectrum in the case of Fraunhofer lines) that permits inferences about the chemical composition of the solar photosphere, for example.

Much like geometry, probability theory is an essential ingredient of many physical theories. It allows non-exhaustive, "sparse" descriptions of reality, and it provides a check if these descriptions are consistent with what we know.

Fra · Nov 9, 2021

WernerQH said:

I concur with de Finetti that probabilities cannot be physical.

WernerQH said:

I find the discussion whether probabilities are objective or subjective rather pointless.

WernerQH said:

They are part of our description, not of reality itself. And a physical description is always (to varying degrees) tentative (to use a better term than "subjective").

I think we are now discussing philosophy of science, and we may have different views on to what extende the foundations of science relate and are relevant to the foundations of physics. I have the opinon that that foundations of physics has a MUCH deeper connection to the foundations of science than say foundations of chemistry.

This is difficult topics to discuss that invariably naturally raises disagreement.

/Fredrik

A. Neumaier · Nov 9, 2021

WernerQH said:

They are part of our description, not of reality itself.

Everything we do in physics is part of our description, not of reality itself. This doesn't make physics subjective. Descriptive probability is objective whenever the way how one arrives at the probabilities given the data are made explicit.

WernerQH said:

A velocity distribution can turn out to be non-Maxwellian.

But not in the many cases where the data are compatible with a Maxwell distribution.

WernerQH said:

It is the departures from thermal equilibrium (from the blackbody spectrum in the case of Fraunhofer lines) that permits inferences about the chemical composition

The amount of departure that permits the inference is equally objective, since it does not depend on who observes it or makes the analysis.

WernerQH · Nov 9, 2021

A. Neumaier said:

Descriptive probability is objective whenever the way how one arrives at the probabilities given the data are made explicit.

Of course. I'm not interested in mincing words. I think also Jaynes would agree to this. He put a lot of effort in clarifying how we arrive at probabilities. What makes you so fussy calling his probabilities "subjective"? I find the frequentist view of probabilities far too narrow, but I suspect nothing will convince you.

gentzen · Nov 9, 2021

WernerQH said:

What makes you so fussy calling his probabilities "subjective"? I find the frequentist view of probabilities far too narrow, but I suspect nothing will convince you.

Indeed, the frequentist view of probabilities risks being too narrow. Part of the rehabilitation is to make it a bit less narrow, but not too much. In fact, this is not too different from what you wrote yourself above:

WernerQH said:

... whether probabilities are objective or subjective ... By necessity they are both, because probabilities are the glue that connects our theories to the real world.

Saying that the velocities of the molecules in a gas are subject to a Maxwellian distribution is surely a probabilistic statement. If physics is not to turn into a meaningless game, the temperature of a gas and probabilities must have objective meaning.

On the other hand, Bayesian views of probability risk being too broad. And Jaynes risks to devalue even his own achievements, because he does not want to face the fact that his objective Bayesian perspective cannot solve all problems and paradoxes that arise in connection with probabilities. His "solution" to denounce infinity in all its forms without taking proper care of appropriate limit concepts simply doesn't work.

A. Neumaier · Nov 9, 2021

WernerQH said:

What makes you so fussy calling his probabilities "subjective"?

I label it such because it is his terminology:

Jaynes (p.44) said:

In the theory we are developing, any probability assignment is necessarily ‘subjective’ in the sense that it describes only a state of knowledge, and not anything that could be measured in a physical experiment. Inevitably, someone will demand to know: ‘Whose state of knowledge?’ The answer is always: ‘That of the robot – or of anyone else who is given the same information and reasons according to the desiderata used in our derivations in this chapter.’

Whatever is solely in the mind of a robot or a human (i.e., a state of knowledge) is subjective. Whereas if everything follows from the data by precisely spelled out rules it is objective.

Only if you completely specify the prior and how the posterior translates into numbers to be reported., the Bayesian probability becomes objective, though the choice of the prior and the recipe for extracting reportable numbers from the posterior are subjective acts. By specifying them others can check how you arrived at your results, and can criticize the whole procedure. In particular, the prior and the output recipe can in principle be falsified by sufficiently extensive subsequent experiments.

A. Neumaier · Nov 9, 2021

WernerQH said:

He put a lot of effort in clarifying how we arrive at probabilities.

Unfortunately, his analysis is essentially never followed in practice, where frequentist methods (and hierarchical methods such as REML, which have both a frequentist and a Bayesian derivation) prevail. General Bayesian techniques are feasible only for low-dimensional data analysis, and hence is far removed from today's needs.

In other words, Jaynes put a lot of effort in clarifying how we should arrive at probabilities, but practitioners rarely heed his moral commands.

PeterDonis · Nov 9, 2021

A. Neumaier said:

Whatever is solely in the mind of a robot or a human (i.e., a state of knowledge) is subjective.

As I have already pointed out, Jaynes's usage of "subjective" here is very different from yours. His answer to "whose state of knowledge" is not just "That of the robot": it is "‘That of the robot – or of anyone else who is given the same information and reasons according to the desiderata used in our derivations in this chapter." Which meets your definition of "objective":

A. Neumaier said:

Whereas if everything follows from the data by precisely spelled out rules it is objective.

PeterDonis · Nov 9, 2021

A. Neumaier said:

the choice of the prior and the recipe for extracting reportable numbers from the posterior are subjective acts.

As I have said before, Jaynes's aim in his book is to give objective (by your definition) procedures for assigning prior probabilities. (For the "extracting reportable numbers" part, I'm a bit confused, because as far as Jaynes is concerned, the posterior probabilities are the reportable numbers.)

A. Neumaier said:

By specifying them others can check how you arrived at your results, and can criticize the whole procedure. In particular, the prior and the output recipe can in principle be falsified by sufficiently extensive subsequent experiments.

Not only is this true, it is often the whole point of going through the process Jaynes describes. The results of the process Jaynes describes tell you what is implied by your current knowledge. Then you run actual experiments to see what happens. If actual experiments match what you got from Jaynes's process, it means your current knowledge is correct (as far as the experiments can tell), which is good to know but doesn't advance your knowledge any. If you see something different in actual experiments, that means your current knowledge is incomplete and you have an opportunity to come up with a better theoretical model. That's progress.

A. Neumaier · Nov 9, 2021

PeterDonis said:

As I have already pointed out, Jaynes's usage of "subjective" here is very different from yours. His answer to "whose state of knowledge" is not just "That of the robot": it is "‘That of the robot – or of anyone else who is given the same information and reasons according to the desiderata used in our derivations in this chapter." Which meets your definition of "objective":

No. The prior and the rules for extracting numbers from the posterior are generally private to the robot; they are objective only if they are shared with anyone else. That's why he uses the term subjective - and in the same sense as I.

PeterDonis said:

as far as Jaynes is concerned, the posterior probabilities are the reportable numbers.

In publications one always reports a few numbers, not a posterior distribution - which is generally far too complex to report. Thus one needs a step to go from the distribution to these numbers.

WernerQH · Nov 9, 2021

gentzen said:

... he does not want to face the fact that his objective Bayesian perspective cannot solve all problems and paradoxes that arise in connection with probabilities. His "solution" to denounce infinity in all its forms without taking proper care of appropriate limit concepts simply doesn't work.

Doesn't he address exactly this problem in appendix B.2 of his book?
As I understand him, he urges "taking proper care" of the limiting process.

PeterDonis · Nov 9, 2021

A. Neumaier said:

No. The prior and the rules for extracting numbers from the posterior are generally private to the robot; they are objective only if they are shared with anyone else. That's why he uses the term subjective - and in the same sense as I.

Once again you and I are reading Jaynes very differently. I don't see the point of belaboring it any further.

A. Neumaier said:

In publications one always reports a few numbers, not a posterior distribution - which is generally far too complex to report. Thus one needs a step to go from the distribution to these numbers.

The gist of your comment here and in other places is that actual practice in the scientific community does not match Jaynes's prescriptions. I am not disputing that. I am simply trying to correctly describe what Jaynes's prescriptions are, whether or not anyone else follows them.

gentzen · Nov 9, 2021

WernerQH said:

Doesn't he address exactly this problem in appendix B.2 of his book?
As I understand him, he urges "taking proper care" of the limiting process.

No, those words only expose his thoughts about how those paradoxes could be resolved. They fail to resolve them adequately. And he simply doesn't like the message of those paradoxes:

It remains to discuss the implications of this analysis for objective Bayesianism. There are strong indications that the requirement that an improper inference be a probability limit is very restrictive. In group models, Stone has shown that the formal posterior can only be a probability limit if the prior is right Haar measure and the group satisfies a technical condition, known as amenability [13]. Eaton and Sudderth have shown that many of the formal posteriors of multivariate analysis are “incoherent” or strongly inconsistent, and thus cannot be probability limits [16].

(quoted from the section "Discussion" of the link behind my "simply doesn't work.") The section "Stone's Example" shows what goes wrong for the improper prior ##\exp(a\theta)## as an example. It feels "too massive" as a prior to me, and I guess the Haar measure in a non-amenable group will also be "too massive" in a certain sense.

Jaynes may have enjoyed opposing the whole establishment, but that doesn't resolve the paradoxes:

“On many technical issues we disagree strongly with de Finetti. It appears to us that his way of treating infinite sets has opened up a Pandora’s box of useless and unecessary paradoxes.” E.T. Jaynes, PT, p.xxi

... its extensive and exciting coverage of the marginalisation paradoxes which saw Jaynes opposing David, Stone, and Zidek (and even the whole Establishment, page 470), ...

“There are no really trustworthy standards of rigor in a mathematics that embraced the theory of infinite sets.“ E.T. Jaynes, PT, p.xxvii

Except for the incomprehensible shots at formalised mathematics (Bourbakism), measure theory (as in Bertrand’s paradox) and Feller (...), I found the book quite pleasant and mostly in tune with my perception of Bayesian statistics (if strong on the militant side!). Jaynes did not think much of Bayes himself (an amateur!, on page 112), considering that Laplace had done much more to establish Bayesian-ism, and he clearly is a staunch supported of Jeffreys, if not of de Finetti.

PeterDonis · Nov 9, 2021

gentzen said:

the message of those paradoxes

From the paper you reference, it seems to me that the key issue is that the group measure for a non-compact group is not normalizable. A simple example given in the paper is that, if the group in question is the reals--for example, if we think the problem is invariant under translation along one direction, regardless of the size of the translation--then the appropriate measure is Lebesgue measure, which is not normalizable; the total measure over the reals is infinite.

However, I'm not sure any real problem actually requires the full range of a non-compact group. In the simple example just described, any real problem will not be invariant under translation by any distance whatsoever. It will only be invariant under translation over some bounded region. So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.

A. Neumaier · Nov 9, 2021

PeterDonis said:

As I have said before, Jaynes's aim in his book is to give objective (by your definition) procedures for assigning prior probabilities.

On p.373 Jaynes makes the same claim, with the same definition of objective:

Jaynes said:

In our view, problems of inference are ill-posed until we recognize three essential things.
(A) The prior probabilities represent our prior information, and are to be determined, not by introspection, but by logical analysis of that information.
(B) Since the final conclusions depend necessarily on both the prior information and the data, it follows that, in formulating a problem, one must specify the prior information to be used just as fully as one specifies the data.
(C) Our goal is that inferences are to be completely ‘objective’ in the sense that two persons with the same prior information must assign the same prior probabilities.

But he does not redeem his promise. The point is that if the prior information is objective, it is not given by a prior probability distribution, since prior information X are concepts and numbers, not distributions. Thus (A) is not a fact but wishful thinking. There is a subjective step involved in converting the prior information into a prior distribution, which makes (C) also wishful thinking.

Once the prior distribution is specified, the posterior is objectively determined by it and the rules. But whereas in the passage I had cited earlier, Jaynes distinguished between the prior information X and the prior distribution P(A|X), he now identifies them, contradicting himself. Indeed, X and P(A|X) are mathematically two very distinct items. To know X says nothing at all about P(A|X).

In our example of quantum tomography, X is 'the Hilbert space of two qubits is ##C^2\otimes C^2##, while P(A|X) is a distribution of 4x4 density matrices. Jaynes says nothing at all about how one objectively deduces this probability distribution from X. He only gives plausibility arguments for a few elementary sample cases, primarily group invariance considerations. Invariance suggests a complex Wishart distribution as sensible prior, but there is a 17-dimensional family of these, and none of them has any merit of being distinguished. Even if one opts for simplicity and sets the scale matrix to the identity (which already adds information not in the prior information), another parameter ##n>3## remains to be chosen that has no natural default value. Thus different subjects would most likely pick different priors to represent the same prior information X. This makes the choice of the prior subjective given only the prior information X.

A. Neumaier · Nov 9, 2021

PeterDonis said:

For any system on which we could do quantum tomography, won't there be one unique finite-dimensional Hilbert space? For example, if I have two qubits, possibly entangled (and I want to use quantum tomography to determine whether they are entangled), isn't the Hilbert space just ##\mathbb{C}^2 \times \mathbb{C}^2##?

If you regard your system as two qubits, this determines the Hilbert space, because a qubit is a mathematical abstraction. But real experiments are made with beams of light, and there are choices of how you model the system. Even if you ignore polarization and the fact that a beam is never infinitely thin (which strictly speaking makes a photon state a function of momentum), the photon Hilbert space is still an infinite-dimensional space for harmonic oscillators of different frequency (so that you can consider squeezed states and parametric down-conversion). This must be truncated by idealization to a finite-dimensional space. In quantum state tomography one would typically assume the frequency to be fixed and the intensity of the beam to be low enough so that only a few basis states need to be considered. But if you want to measure the Wigner function, you need many more excited states.

gentzen · Nov 9, 2021

PeterDonis said:

From the paper you reference, it seems to me that the key issue is that the group measure for a non-compact group is not normalizable.

Well, being non-amenable is worse than just being non-compact. The group measure is more than just not normalizable, it is also not well approximable by normalizable measures in the appropriate sense.

PeterDonis said:

However, I'm not sure any real problem actually requires the full range of a non-compact group. ... So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.

If I interpret your idea here as approximating a non-compact group by a compact group, then being non-amenable will have the effect that you cannot approximate the group by compact groups in the appropriate sense.

A. Neumaier · Nov 9, 2021

PeterDonis said:

any real problem will not be invariant under translation by any distance whatsoever. It will only be invariant under translation over some bounded region. So I ought to be able to find some compact group with a normalizable measure that represents the actual invariance and use that instead.

This group consists of a single element, the identity. (Translations ober a bounded region are only partially defined and do not form a group.) But to lead to a noninformative prior the group must at least be transitive of the set on which the probability distribution is sought.

Nontrivial group invariance is a rare property in real applications.

PeterDonis · Nov 9, 2021

gentzen said:

If I interpret your idea here as approximating a non-compact group by a compact group

Not approximating, no, just replacing one with the other based on a better specification of the actual invariance of the problem. But in view of what @A. Neumaier says in post #65, the resulting structure might not be a group and might not have all of the required properties.

Fra · Nov 9, 2021

I enjoy this 4th philosophical viewpoints of objective bayesian, as Berger puts it:

"Objective Bayesian analysis is simply a collection of ad hoc but useful methodologies for learning from data"
-- https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf

This for me paints a picture of the level of ambition of explanatory power and thus the problem of the objective approach. For those that prefer subjective coherence over objective "ad hoc", may prefer the more powerful dark side even if dangerous :nb)

/Fredrik

WernerQH · Nov 10, 2021

gentzen said:

Jaynes may have enjoyed opposing the whole establishment, but that doesn't resolve the paradoxes

Thanks for the references. I can't say I find them more convincing than Jaynes 's exposition of the marginalization paradox. In view of several decades of debates it seems unlikely that I'll be able to understand your reservations about objective Bayesianism. Do you know of a real-world problem where difficulties of this kind have turned up? (Jaynes's discussion of Bertrand's problem satisfied me. But I'm just a physicist. ;-))

A. Neumaier · Nov 10, 2021

WernerQH said:

Do you know of a real-world problem where difficulties of this kind have turned up?

Most likely, there cannot be any. The reason is that real world applications usually do not use Bayesian methods, as the latter are restricted to low dimensional problems.

The only exception are models based on exponential families, where conjugate priors can be easily specified and updated since all estimation boils down to updating the ordinary sample mean of a sufficient statistics. The Bayesian estimate is derived from the latter. This is equivalent to regularized frequentist statistics based on exponential families. Thus nothing is gained through Bayesian methods compared to frequentist ones.

gentzen · Nov 10, 2021

WernerQH said:

In view of several decades of debates it seems unlikely that I'll be able to understand your reservations about objective Bayesianism. Do you know of a real-world problem where difficulties of this kind have turned up? (Jaynes's discussion of Bertrand's problem satisfied me. But I'm just a physicist. ;-))

My reservations and the "difficulties of this kind" are two separate topics. My reservations are about Jaynes' book, and about the unrealistic expectations it creates. Just like you, I am not an expert on Bayesianism and the several decades of debates. I gave an explicit example of how those unrealistic expectations play out in the real-world before:

gentzen said:

..., you can somehow magically encode your background knowledge into a prior (which is a sort of not necessarily normalisable probability distribution), add some observed facts, and then get the probability for a given proposition (given your prior and your observations) as a result.

Of course, this is a caricature version of the Bayesian interpretation, but people do use it that way. And they use it with the intention to convince other people. So what strikes me as misguided is not when people like Scott Aaronson use Bayesian arguments in addition to more conventional arguments to convince other people, but when they replace perfectly fine arguments by a supposedly superior Bayesian argument and exclaim: "This post supersedes my 2006 post on the same topic, which I hereby retire." For me, this is related to the philosophy of Cox's theorem that a single number is preferable over multiple independent numbers (https://philosophy.stackexchange.co...an-reasoning-related-to-the-scientific-method). ...

Other people seem to share my reservations:

As the saying goes, the problem with Bayes is the Bayesians. It’s the whole religion thing, the people who say that Bayesian reasoning is just rational thinking, or that rational thinking is necessarily Bayesian, the people who refuse to check their models because subjectivity, the people who try to talk you into using a “reference prior” because objectivity. Bayesian inference is a tool. It solves some problems but not all, and I’m exhausted by the ideology of the Bayes-evangelists.

The "difficulties of this kind" on the other hand is more a gut feeling (or an educated guess) on my part, as opposed to solid knowledge. I also have to "fight" to understand that stuff. You are no exception here. Other people mention "high dimensions" when talking about the Hidden dangers of noninformative priors:

And, when you increase the dimensionality of a problem, both these things happen: data per parameter become more sparse, and priors distribution that are innocuous in low dimensions become strong and highly informative (sometimes in a bad way) in high dimensions.

But my gut feeling expects something worse than just insufficient data per parameter. Something like

Here we show that, in general, the prior remains important even in the limit of an infinite number of measurements. We illustrate this point with several examples where two priors lead to very different conclusions given the same measurement data.

from an abstract of Christopher Fuchs and Ruediger Schacks. (I didn't read their paper yet, even so it is short. But I saw their abstract a long time ago, and it did influence my gut feeling.) Basically, I expect a fundamental limitation of achievable accuracy. And I expect that this enables you to include some preferred properties in your model, for example that there exists only a single world.

Bayesian statistics in science

Similar threads

Hot Threads

Recent Insights