Understanding the Uniform Probability Distribution in Statistical Ensembles

stevendaryl · Apr 22, 2016

A. Neumaier said:

Subjective judgments (and in particular subjective probabilities) have no place at all in physics

That's completely false. We can't make any predictions at all without making assumptions that are subjective. You have to assume that your theory is correct, in the first place. You have to assume that your measurement devices worked correctly. You have to assume that you've accounted for all the relevant causal effects. You have to assume that records of past measurements were accurately recorded. There are countless assumptions that everyone must make in order to do the simplest sort of reasoning in physics. Most of those assumptions are completely subjective. You can certainly try to check your assumptions by repeating your measurements, and double-checking everything, but it's subjective whether you've repeated things enough times, whether you've double-checked enough times.

It is impossible to get along in the world without subjective judgments.

A. Neumaier · Apr 22, 2016

stevendaryl said:

to make objective probabilistic predictions in physics, you have to know the initial states.

The S-matrix gives objective probabilities for the outcomes given the input. The input is very accurately known in collision experiments - so accurate that they can check whether the scattering predictions come true or would represent violations of the standard model.

stevendaryl · Apr 22, 2016

A. Neumaier said:

Probabilities that lead to accurate predictions are objective, not subjective.

Whether a prediction is "accurate" or not is subjective. You predict that a coin toss has a 50% chance of resulting in heads. You toss 100 coins, and get 53 heads. Was that an accurate prediction, or not? It's not 50%. At some point, you're going to make a subjective decision that your statistics agree close enough with your predictions, and then you'll declare the predictions accurate.

stevendaryl · Apr 22, 2016

A. Neumaier said:

The S-matrix gives objective probabilities for the outcomes given the input.

The S-matrix makes asymptotic predictions: Some number of particles come in from infinity, where it's assumed that there are no interactions, collide and then the product particles go out to infinity. In the real world, we don't have particles coming in from infinity, and particles are always interacting. So to compare the S-matrix to actual experiments requires judgment. I claim that there is a subjective element to that judgment, inevitably.

stevendaryl · Apr 22, 2016

A. Neumaier said:

With your use of the notion ''subjective'' everything physicists do, and all science is subjective, and the term (and its opposite ''objective'') lose their traditional meaning.

It's a subjective judgment to call something objective. I know that's unsatisfying, but that's the way it is.

stevendaryl · Apr 22, 2016

stevendaryl said:

It's a subjective judgment to call something objective. I know that's unsatisfying, but that's the way it is.

I can see that this has gotten into a philosophical discussion about the meaning of probability and objectivity, and that's probably off-topic. So I will refrain from further replies on this topic.

A. Neumaier · Apr 22, 2016

bhobba said:

Bayesian inference - how does that fit? It can be done in a frequentest way but its not natural.

Bayesian inference if done in an objective manner, means to account for prior information in the likelihood function in a roundabout way. One adds extra prior terms that reflect (in a frequentist interpretation) what would have been obtained from data equivalent to the assumed knowledge. If the assumed knowledge (i.e., the prior) is true knowledge, the resulting Bayesian prediction is more accurate than without the prior; if the prior represents prejudice only, the resulting Bayesian prediction is heavily biased towards the prejudice unless a huge amount of data are present to cancel it.

For example, the Kalman filter for updating a Gaussian probability model is Bayesian in form as the current model is updated each time an additional data set comes in. However, if one considers the whole data stream as the data, it can be seen (when started with an improper prior at time zero) to be an optimal model according to the purely frequentist Gauss-Markov theorem for the estimation of linear models. The same holds for REML (restricted maximum likelihood), which is in spirit Bayesian but can be fully treated in a purely frequentist framework.

Thus it is only a matter of presentation and subjective preference whether to take a Bayesian or a frequentist view. Bayesian statistics is not intrinsically related to a subjective view of probability. It is a mathematical technique that is used in statistical practice in a shut-up-and-calculate way like quantum mechanics in physical practice.

In case you think I might not understand what I am talking about: As part of my work at the University of Vienna, I give regularly courses on statistical data analysis. I have written a big survey article about regularization (the abstract version of Bayesian inference in linear models) in SIAM Review 40 (1998), 636-666. I have worked on the Bayesian (REML) estimation of large animal breeding models; algorithms based upon my work are used all over the world to decide on animal breeding.

bhobba · Apr 22, 2016

A. Neumaier said:

In case you think I might not understand what I am talking about:

You obviously do. The initial probability, how is that arrived at in a frequentest view?

Take for example a coin. You start with it at 50-50 then flip the coin to update. In a frequentest view why would you start at 50-50?

Thanks
Bill

A. Neumaier · Apr 22, 2016

stevendaryl said:

It's a subjective judgment to call something objective.

As everything is subjective according to your usage of the word, it is meaningless to apply the adjective to anything, as it has no discriminative value. Your usage is far from how everyone else uses the word.

Is there anything that, according to you, fully deserves being called objective?
If not, why do you think the language contains such a term?
Why is science generally considered to collect objective knowledge?

stevendaryl said:

the meaning of probability and objectivity, and that's probably off-topic. So I will refrain from further replies on this topic.

The topic is ''what is an ensemble?'' and this is essentially synonymous with ''what is probability?'' It has a large physical (objective) aspect and a small philosophical (subjective) aspect. You are pulling the weight fully to the subjective side, but this is your subjective bias.

Mentz114 · Apr 22, 2016

stevendaryl said:

But you're making the assumption that equal volumes in phase space are equally likely. I guess you could say that that's the way you're defining "likelihood", but why phase space? For a single particle, you could characterize the particle's state (in one-dimension, for simplicity) by the pair [itex]p, x[/itex], where [itex]p[/itex] is the momentum. Or you could characterize it by the pair [itex]v, x[/itex], where [itex]v[/itex] is the velocity. If you include relativistic effects, [itex]v[/itex] is not linearly proportional to [itex]p[/itex], so equal volumes in [itex]p,x[/itex] space don't correspond to equal volumes in [itex]v,x[/itex]. So why should one be the definition of "equally likely" rather than the other?

Because physics is about phase and configuration space. Most of what you've been saying is off topic. You're moving the goalposts around wildly so I don't know what you are trying to say.

Have a look at this
https://en.wikipedia.org/wiki/Phase_space_formulation
and this
https://web.stanford.edu/~peastman/statmech/phasespace.html
and
http://arxiv.org/abs/1003.0772
and
http://www.springer.com/us/book/9780792337942

A. Neumaier · Apr 22, 2016

bhobba said:

The initial probability, how is that arrived at in a frequentest view?

Take for example a coin. You start with it at 50-50 then flip the coin to update. In a frequentest view why would you start at 50-50?

You wouldn't unless you have good reasons to assume that the coin is almost fair.

In both the frequentist and the Bayesian case one starts with a prior count ##H_0## of heads and ##T_0## of tails. Then you flip a number of times and find ##H## heads and ##T## tails. You update the frequencies and get ##H'=H_0+H## and ##T'=T_0+T##. Then you estimate the probability for head as ##P_H:=H'/(H'+T')##.

If one initially knows nothing at all - in fact, unknown to everybody, someone prepared the coin so that both sides show head, and the experimenters see only the result, not the act of falling! -, the Bayesian starts with the unwarranted assumption [using an allegedly ''uninformative prior'', but still a prejudice] that ##H_0=T_0>0## (with a value that depends on how strongly the prior is believed to be true) while the frequentist puts correctly ##H_0=T_0=0##. It takes the Bayesian estimate a long time to realize that the coin was forged, while the frequentist gets the answer correct from the start. This shows the bad influence of a prejudice. (A real person would soon be suspicious about the coin, but a true Bayesian - following objective shut-up-and-calculate techniques rather than being subjective) will be unable to do that.

On the other hand, if the coin is known to be almost fair (because it looks like many other coins that have been tried before), both Bayesian and frequentist will assign ##H_0=T_0>0## - the frequentist by making a (somewhat subjective) estimate of how many equivalent coin flips the prior knowledge is worth, and checks during the computation whether the assumed estimate has a large effect on the result. (In technical terms, this is a regularization parameter. There are a number of ways this parameter can be objectively chosen under appropriate assumptions.) I have no idea how a true Bayesian would assigns the actual value of ##H_0=T_0>0## since probability theory gives no hints. In practice, there is no difference between the two; it is shut-up-and-calculate according to recipes taken from the literature.

If there are enough data and the prior is not weighted too much, the result is indifferent to the value of the prior.

A. Neumaier · Apr 22, 2016

bhobba said:

Jaynes was a physicist.

But he was mistaken about his subjective interpretation of physics. His interpretation only works because he knew already (from half a century of prior objective physics) which subjective assumptions he has to make to get it objectively correct. If he would assume in place of the subjective knowledge of ##\langle H\rangle## (which Nature happens to make use of) the subjective knowledge of ##\langle H^2\rangle## (which Nature abhors) he would have obtained in place of the canonical ensemble a ridiculously wrong ensemble. And even with the canonical ensemble, if he would know subjectively the wrong value of ##\langle H\rangle## (which is very well possible since in a subjective, stevendaryl-type of physics, no one specifies objectively what it means to have knowledge, then Jaynes would assign an equally wrong value for the temperature.

This proves that even in the context of the maximum entropy principle, only knowledge of the objectively correct information produces a reliable physical model and enables reliable physical predictions. Again, there is nothing subjective in the physics. Subjective deviations from the objective reality lead here (as always) to inaccurate or even grossly wrong predictions.

Mentz114 · Apr 22, 2016

bhobba said:

The difference between probability and likelihood is exactly what? Please be precise. I think you will find its very very slippery just like pinning down exactly what a point is rather slippery. That's why the axiomatic method was developed - it wasn't just so pure mathematicians could while away their time.

Thanks
Bill

bhobba,

I mean as in the likelihood function defined here.

https://en.wikipedia.org/wiki/Maximum_likelihood

A. Neumaier · Apr 22, 2016

Mentz114 said:

I mean as in the likelihood function defined here.

https://en.wikipedia.org/wiki/Maximum_likelihood

Then it is the logarithm of the probability density with respect to a prior measure. This is surely less fundamental than the notion of probability, which is independent of a prior measure.

stevendaryl · Apr 22, 2016

A. Neumaier said:

As everything is subjective according to your usage of the word, it is meaningless to apply the adjective to anything, as it has no discriminative value. Your usage is far from how everyone else uses the word.

Is there anything that, according to you, fully deserves being called objective?

No. I don't. I think that it's a short-cut in reasoning. To take into account all the ways that our judgments are influenced by unproved assumptions is intractable and inconvenient. So it's useful to be able to have cut-offs, where you treat sufficiently unlikely possibilities as if they were impossibilities. So the kind of reasoning that people typically do is a rule of thumb. It's subjective, but it's not consciously subjective.

Mentz114 · Apr 22, 2016

A. Neumaier said:

Then it is the logarithm of the probability density with respect to a prior measure. This is surely less fundamental than the notion of probability, which is independent of a prior measure.

I cannot (literally) argue against that. I was struck by a similarity to the path integral but that's probably spurious.

Ordinary folk, interestingly, have no idea of probability. One person I knew said after hearing there was a 40% chance of rain, asked '40% of what ?'
What people experience is 'confidence' and they can express it as likelihood ratios or 'odds'

stevendaryl · Apr 22, 2016

Mentz114 said:

Because physics is about phase and configuration space. Most of what you've been saying is off topic. You're moving the goalposts around wildly so I don't know what you are trying to say.

I'm sorry you feel that way. I'm just saying that volume in phase space is not the definition of likelihood. In certain circumstances, it's reasonable to assume that equal volumes in phase space imply equal likelihood, but that's an assumpion--it's not the definition of likelihood.

Have a look at this
https://en.wikipedia.org/wiki/Phase_space_formulation
and this
https://web.stanford.edu/~peastman/statmech/phasespace.html
and
http://arxiv.org/abs/1003.0772
and
http://www.springer.com/us/book/9780792337942

I know what phase space is.

rubi · Apr 22, 2016

The reason for using the phase space probability density is ergodicity. Ergodicity is supposed to single out the microcanonical ensemble and the other ensembles can be derived from it. Unfortunately, it's too hard to prove ergodicity for even the simplest physical systems. Nevertheless, it's a reasonable assumption in most situations. So at least for ergodic systems, the microcanonical ensemble is a hard, objective prediction of the theory.

stevendaryl · Apr 22, 2016

rubi said:

The reason for using the phase space probability density is ergodicity. Ergodicity is supposed to single out the microcanonical ensemble and the other ensembles can be derived from it. Unfortunately, it's too hard to prove ergodicity for even the simplest physical systems. Nevertheless, it's a reasonable assumption in most situations. So at least for ergodic systems, the microcanonical ensemble is a hard, objective prediction of the theory.

Related to the ergodicity assumption is the assumption that ensemble average of a quantity is equal to the time average.

rubi · Apr 22, 2016

stevendaryl said:

Related to the ergodicity assumption is the assumption that ensemble average of a quantity is equal to the time average.

Right, this is the more physical way of stating the ergodic hypothesis. In modern mathematical language, one usually defines ergodicity as a requirement on the probability measure. The equality of time averages and ensemble averages then follows from the so called ergodic theorems, for instance the Birkhoff ergodic theorem.

A. Neumaier · Apr 22, 2016

rubi said:

Ergodicity is [...] a reasonable assumption in most situations.

... though it is in fact known to be wrong in many physically relevant cases. It thus only has heuristic value.

rubi · Apr 22, 2016

A. Neumaier said:

... though it is in fact known to be wrong in many physically relevant cases. It thus only has heuristic value.

Well, I agree that this issue hasn't been addressed in a fully satisfactory way yet. But at least ergodic theory gives some confidence in the validity of the microcanonical ensemble.

(Here's a side question that interests me: Do you know whether such systems that are known not to be ergodic are usually well described by the microcanonical ensemble in experiments nevertheless?)

A. Neumaier · Apr 22, 2016

rubi said:

Do you know whether such systems that are known not to be ergodic are usually well described by the microcanonical ensemble in experiments nevertheless?

Probably yes (if they are large and simple enough), since in the thermodynamic limit the ensemble is equivalent to the grand canonical ensemble. Working with the latter is much simpler, closer to the formulas used in the applications, needs much weaker assumptions, and works identically in the classical and in the quantum case.

N88 · Apr 22, 2016

Demystifier said:

Then let me use an example. Suppose that you flip a coin, but only ONCE. How would you justify that the probability of getting heads is ##p=1/2##? Would you use an ensemble for that?

Edited 'ones' to 'ONCE'.

Interesting question! In the context of your opening reply to the OP, see interesting answer: http://arnold-neumaier.at/physfaq/topics/singleEvents

bhobba · Apr 22, 2016

Mentz114 said:

https://en.wikipedia.org/wiki/Maximum_likelihood

Got it.

However probability concepts such as maximum likelihood estimator are used throughout that link. I still suspect the whole thing is circular.

Thanks
Bill

atyy · Apr 22, 2016

A. Neumaier said:

Bayesian statistics is not intrinsically related to a subjective view of probability.

Because "Bayesian statistics" as used by many is not Bayesian. It just means one uses Bayes's rule, which is common to both Bayesian and Frequentist views.

But couldn't one say that the subjective view is more general, since from the subjective view, the frequentist view can be derived with an additional assumption (via https://en.wikipedia.org/wiki/Exchangeable_random_variables), but the subjective view cannot (I think) be derived from the frequentist view?

A. Neumaier · Apr 23, 2016

atyy said:

couldn't one say that the subjective view is more general, since from the subjective view, the frequentist view can be derived

Only if like stevendaryl one calles everything subjective, including the choice of an operational criterion to give a concept an objective meaning. I find such a usage of the terms unacceptable.

atyy · Apr 23, 2016

A. Neumaier said:

Only if like stevendaryl one calles everything subjective, including the choice of an operational criterion to give a concept an objective meaning. I find such a usage of the terms unacceptable.

Hmmm, I'm not sure I would go that far (in fact, I'm personally a frequentist). But would you call de Finetti's view subjective or objective?

A. Neumaier · Apr 23, 2016

atyy said:

Hmmm, I'm not sure I would go that far (in fact, I'm personally a frequentist). But would you call de Finetti's view subjective or objective?

It is long ago that I had looked at de Finetti. I read his work during the stage when I formed my own view but since then lost interest in keeping in mind all possible views. Could you please summarize the essence of his view, in as far as it differs from the frequentist view?

The point is that once objective is taken to mean something definite relevant for science (in the philosophical sense, irrespective of the fact that one can question everything) then probability, a key scientific concept, needs an operational definition, and this defines an objective meaning, hence objective probability. Objective not in the sense that it can be always specified to arbitrarily many digits but in the sense that one can communicate its meaning without ambiguity to others within the uncertainty that is inherent in any concept. (Unlike stevendaryl I strictly distinguish between uncertainty, probability, and subjectivity. Uncertainty is often not probabilistic and objective.)

atyy · Apr 23, 2016

A. Neumaier said:

It is long ago that I had looked at de Finetti. I read his work during the stage when I formed my own view but since then lost interest in keeping in mind all possible views. Could you please summarize the essence of his view, in as far as it differs from the frequentist view?

The point is that once objective is taken to mean something definite relevant for science (in the philosophical sense, irrespective of the fact that one can question everything) then probability, a key scientific concept, needs an operational definition, and this defines an objective meaning, hence objective probability. Objective not in the sense that it can be always specified to arbitrarily many digits but in the sense that one can communicate its meaning without ambiguity to others within the uncertainty that is inherent in any concept. (Unlike stevendaryl I strictly distinguish between uncertainty, probability, and subjectivity. Uncertainty is often not probabilistic and objective.)

https://faculty.fuqua.duke.edu/~rnau/definettiwasright.pdf

"In the conception we follow and sustain here, only subjective probabilities exist – i.e., the degree of belief in the occurrence of an event attributed by a given person at a given instant and with a given set of information." [de Finetti]

"All three authors proposed essentially the same behavioristic definition of probability, namely that it is a rate at which an individual is willing to bet on the occurrence of an event. Betting rates are the primitive measurements that reveal your probabilities or someone else’s probabilities, which are the only probabilities that really exist." [Nau's commentary on de Finetti]

A. Neumaier · Apr 23, 2016

atyy said:

it is a rate at which an individual is willing to bet on the occurrence of an event.

This makes it truly subjective, and very restrictive, too. Most people never bet, hence couldn't use de-Finetti-probabilites.

In any case such a definition is meaningless for the scientific concept of probability. The decay probabilities of nuclear species are constants of nature and had objective values long before people with the ability to bet existed.

atyy · Apr 23, 2016

A. Neumaier said:

This makes it truly subjective, and very restrictive, too. Most people never bet, hence couldn't use de-Finetti-probabilites.

In any case such a definition is meaningless for the scientific concept of probability. The decay probabilities of nuclear species are constants of nature and had objective values long before people with the ability to bet existed.

Hmmmm, very different from my reasons for being a Frequentist. I think it is impractical to be coherent :)

A. Neumaier · Apr 23, 2016

atyy said:

I think it is impractical to be coherent :)

Since many years I have been spending most of my spare time to make my view of physics coherent. It may be impractical initially and may seem like a waste of time and effort, but in the end it is very rewarding.

stevendaryl · Apr 23, 2016

I personally do not think that frequentism is completely coherent. I would actually call it incoherent. But I don't think that Bayesianism is completely coherent, either. It seems to me that in a lot of applications of Bayesianism, there seems to be a role for objective (though unknown) probabilities. So not everything seems to be subjective.

For the simplest example, with coin flips, you assume that the coin is governed by some unknown parameter [itex]q[/itex] that is between 0 and 1, with all values equally likely. Then your subjective probability of "heads" is given by:

[itex]P(heads) = \int dq P(q) P(heads|q) = \int dq 1 \cdot q = \frac{q^2}{2}|_0^1 = \frac{1}{2}[/itex]

If you flip the coin once, and get "heads", then you update the probability distribution on [itex]q[/itex] using Bayes' rule, so instead of the flat distribution [itex]P(q) = 1[/itex], you have a weighted distribution: [itex]P'(q) = 2q[/itex], giving [itex]P'(heads) = 2/3[/itex]. It works out nicely: the probability of heads starts out 1/2, and gradually increases or decreases depending on the history of past coin flips. But it seems to me that the parameter [itex]q[/itex] in this analysis is an unknown objective probability. So this analysis isn't actually treating probability as completely subjective. Similarly, if you apply Bayesianism to quantum mechanics, it seems that you have to treat some probabilities, such as the probability of getting spin-up in the x direction given that the particle was prepared to have spin-up in the z-direction, as objective. So I don't see how Bayesianism really eliminates objective probability, and if it doesn't, then it doesn't give an interpretation of probability, in general.

In the example above, the probabilities that are subjective are in some sense "meta" probabilities--a subjective probability distribution on objective probabilities.

stevendaryl · Apr 23, 2016

A. Neumaier said:

This makes it truly subjective, and very restrictive, too. Most people never bet, hence couldn't use de-Finetti-probabilites.

I disagree. Any time you make a choice to do X or Y, based on probability, you're betting in a sense. There is cost for making the wrong choice. I suppose it's an oversimplification to assume that "costs" can be linearly compared (which is what measuring costs in terms of money assumes).

To say that, because there is no one around to place a bet, then a gambling-based definition of probability is meaningless is being a little bit literalist. There are lots of cases where the closest thing to a "definition" of a physical quantity is operational: the quantity describes what would happen if you were to perform a particular operation. But the quantity exists even if there is nobody around to perform that operation. I suppose in all such cases, you can just let the property be undefined, or primitive, and turn the "definition" into an axiom, rather than a definition, but that's just aesthetics.

Understanding the Uniform Probability Distribution in Statistical Ensembles

Similar threads

Hot Threads

Recent Insights