Probability via Expectation and Callen's criterion

gentzen · Aug 1, 2021

In the thermal interpretation, the collection of all q-expectations (and q-correlations) is the state of a system. The interpretation of q-expectations is used only to provide an ontology, the apparent randomness is analysed and explained separately. This may be non-intuitive. Callen's criterion is evoked to provide operational meaning. As an answer to how specific q-expectations can be determined operationally, it occasionally felt unsatisfactory to me.

However, if we limit our goal to "falsification" instead of "measurement"/"determination"/"estimation", then Callen's criterion (section "2.5 Formal definition of the thermal interpretation" [TI], section "6.6 Quantities, states, uncertainties" [CQP]) becomes quite concrete:

Callen’s criterion: Operationally, a system is in a given state if its properties are consistently described by the theory for this state.

So if the available observations of a system seem inconsistent with the system being in a given state, then operationally the system is not in that state. This risks being a bit tautological, especially if we don't properly distinguish between the real system that provided the available observations, our idealized mathematical model of that system, and the given state. The state belongs to the mathematical model, because otherwise there would be no theory for the state. Even so an inconsistency between observations and the theory for the state might also be the fault of the model, I want to limit the "falsification" here explicitly to the state. Operationally this could mean to exclude observations which seem to be caused by effects not included in the model, like cosmic rays, radioactive decay, or human errors of the operator making the observations.

Expectations allow us to have partial information about the state. In the thermal interpretation, ⟨f(A)⟩_x would be a q-expectation depending on a space-time point x, and ⟨f(A,B)⟩_(x,y) would be a q-correlation depending on two space-time points x and y. It makes sense to assume that the space-time points x and y for which we want to assert values for q-expectations and q-correlations are close to our current position in space and time. (So thinking of the state as the initial state is dangerous, but ignoring the inital state and only thinking about the current state is dangerous too.) And it also makes sense that we only want to assert values for q-expectations up to a certain reasonable precision, especially in cases where additional precision would make no difference in terms of how available (and future) observations might falsify our assertions about the state. More important for falsification is that often the theory also gives a standard deviation (to be interpreted as an uncertainty, see below).

But what is a q-correlation like ⟨f(A,B)⟩_(x,y) operationally, and how can it be observed? This is one of those cases where evoking Callen's criterium to provide operational meaning felt unsatisfactory to me. Most q-expectations (and especially q-correlations) are not directly observable. It is rather the opposite: we can do a number of different observations of the real system, and it is part of our idealized mathematical model of that system that those observations corresponds to certain q-expectations of our mathematical model in its given state (or more generally to functions of q-expectations and q-correlations). And those observations themselves should not be interpreted probabilistically: for example when we observe which photosensitive pixels changed their color, then this itself is the observation. And if the change of color is only gradually, then again this is an observation all by itself, there is no probabilistic averaging involved over atomic constituents of the pixel. And the probabilities (that we try to interpret) and expectations too are just properties of the relation between our simplified mathematical model and the real system, not properties of the real system itself. (Edit: Oh no, talking of "the" real system might be a mistake, since probabilistic models are also used to describe a class of similarly prepared systems.)Despite all that has been said, and all the limitations that have been pointed out, even the modest goal of "operational falsification" is still far from trivial: The observed deviations of a system from expectations (derived from the theory of a given state) can be more or less significant. If a frequentist (virtual) ensemble interpretation of probability is used to quantify and interpret this significance, then it still remains unclear whether a true operational interpretation of probability has been achieved. The non-ensemble interpretation of q-expectations overcomes this difficulty (section "2.3 Uncertainty" [TI], section "3.1 Uncertainty" [CQP]):

to eliminate any trace of a priori statistics from the terminology, we frequently use the terminology uncertain value instead of q-expectation value, and uncertainty instead of q-standard deviation

[TI] A. Neumaier, Foundations of quantum physics, II. The thermal interpretation (2018), https://arxiv.org/abs/1902.10779
[CQP] A. Neumaier, Coherent Quantum Physics (2019), De Gruyter, Texts and Monographs in Theoretical Physics

The intention of the above was to describe an "operational falsification" interpretation of probability in the hope that it would be easier to understand than the whole thermal interpretation. But even an interpretation of probability is hard to describe and involves many subtle issues, so in the end I often had to reference the thermal interpretation, because I wanted the description to stay reasonably short and avoid "my own original ideas". But the description is different from the thermal interpretation, because it prefers to explicitly say things and give examples, even if they might be slighly wrong, or against the intentions of the thermal interpretation.

gentzen said:

vanhees71 said:

I always thought that's also the explanation of your "thermal interpretation" until you told me that your expectation values must not be intepreted in the usual statistical sense but as something abstractly defined in the mathematical formalism without relation to an operational realization by a measurement device.

Based on the current discussion, it occurred to me that the non-ensemble interpretation of q-expectations of the thermal interpretation could be combined with Callen's criterion to arrive at an "operational falsification" interpretation of expectations (probability). That interpretation would be closely related to the frequentist interpretation, but fix its problem related to the assumption/requirement of "virtual" ensembles that allow to arbitrarily often repeat identical experiments (which makes the frequentist interpretation non-operational and non-applicable to many practically relevant scenarios).

In order not to hijack this thread, I will open a separate thread with more explanations when I find the time.

Even so I know that my explanations are less polished than A. Neumaier's papers and book, I hope that they help to see vanhees71 how the thermal interpretation might be different from the minimal statistical interpretation, and what it could mean that it also applies to single observations of macroscopic systems and not just to ensembles of similarly prepared systems. And I hope A. Neumaier can live with the intentional explicit differences to the "thermal interpretation". After all this is intended as an interpretation of probability, not as a faithful description of the thermal interpretation.

vanhees71 · Aug 2, 2021

I don't understand, why the (virtual) ensemble interpretation is supposed not to be operational. To the contrary, it's precisely describing what's done in the lab: One measures many times on equally prepared systems the observable and performs an "error analysis" (estimating both statistical (easy) and systematical (an art of its own) errors). That's what you learn in the introductory lab from day 1 of your studies of physics (at least at universities).

gentzen · Aug 2, 2021

vanhees71 said:

I don't understand, why the (virtual) ensemble interpretation is supposed not to be operational. To the contrary, it's precisely describing what's done in the lab:

The ensemble interpretation can be operational, namely in the cases where it is a good idealization for what is actually done. So if you really measure many times on equally prepared systems, then it is a good idealization. Even if you didn't actually do the measurements, but at least in principle it would be possible for somebody to do them, then again it could be a valid idealization.

But there are many cases where this idealization is invalid, and for those cases the virtual ensemble interpretation is not operational. A typical example is weather prediction.

But the thermal interpretation also worries about cases where the ensemble interpretation could be used, but is not really necessary. If already one or two observations are sufficient to establish the value of some observable with sufficient accuracy, then there is no need to invoke ensembles. However, there is a catch: The model might also predict a standard deviation for the observable (an uncertainty), and even in cases where there is no need to invoke ensembles for the observable itself, establishing an estimate for the standard deviation might need ensembles, i.e. many repeated measurements on "independent" identically prepared systems.

vanhees71 · Aug 2, 2021

Of course, a single measurement is nearly as good as no measurement at all.

The weather forecast is of course a good example, but also here, what it means to say: "This weather model predicts with 30% probability rain in Frankfurt", is that if observing the weather on the next day given the weather conditions today being the same (within the uncertainties to define the precise weather conditions to day) many times in 30 of 100 cases we observe rain on the next day. There's no other way to check the accuracy of the weather model than that. From a single observation you cannot test the weather model at all, and if it rains tomorrow I cannot blame the weather forecaster for spoiling my barbecue party, because I thought a 30% risk of rain is small enough to schedule the party ;-).

dextercioby · Aug 2, 2021

vanhees71 said:

Of course, a single measurement is nearly as good as no measurement at all.

The weather forecast is of course a good example, but also here, what it means to say: "This weather model predicts with 30% probability rain in Frankfurt", is that if observing the weather on the next day given the weather conditions today being the same (within the uncertainties to define the precise weather conditions to day) many times in 30 of 100 cases we observe rain on the next day. There's no other way to check the accuracy of the weather model than that. From a single observation you cannot test the weather model at all, and if it rains tomorrow I cannot blame the weather forecaster for spoiling my barbecue party, because I thought a 30% risk of rain is small enough to schedule the party ;-).

Well, choosing the "weather forecast" as an example is not great, as it doesn't assume repeated "measurement", but instead interpreting some numerical values returned by certain computer software as a probability: https://Earth'science.stackexchange...tion-and-predicted-amount-of-rain/22553#22553

vanhees71 · Aug 2, 2021

But these weather models are tested by using the ensemble interpretation of probabilities, as detailed in my previous posting.

dextercioby · Aug 2, 2021

[...] if observing the weather on the next day given the weather conditions today being the same (within the uncertainties to define the precise weather conditions today) many times in 30 of 100 cases we observe rain on the next day[...]

Can't be right. Weather is observed only one in each place (on a map). Saying: I have 100 places within an area of 20.000 km^2 in Central Germany. If I measure rain >=0,1 mm in 30 of them, then saying "well, the 30% rain probability for Frankfurt shown by my Accuweather smartphone weather application yesterday was correct" is not right.

vanhees71 · Aug 2, 2021

AFAIK that's not what is done, but you take weather data from several decades and check your weather models against those data.

gentzen · Aug 3, 2021

vanhees71 said:

Of course, a single measurement is nearly as good as no measurement at all.

But only because your measurement equipment (and experimental setup and preparation) also has its own uncertainties, not because the measured physical quantity would always include a significant fundamental uncertainty.

vanhees71 said:

The weather forecast is of course a good example, but also here, what it means to say: "This weather model predicts

Strictly speaking, the "weather model" is not a part of the (virtual) ensemble interpretation. On the other hand, the measured input data (including their uncertainty and limited spatial resolution) of the "weather model" would be a valid part of it.

However, I simply used weather prediction as an example, because it was the example Christian Kredler once used in his course on stochastic to sensibilize us to the fact that it is not always clear what is meant by a probability. The most recent article I read about that subject was about earthquake prediction. In section "2.5. Earthquake Forecasts and Weather Forecasts," they argue that the interpretational situation is even worse for earthquakes, because they are so rare. So if you prefer, use earthquake forecasts instead as an example where the (virtual) ensemble interpretation no longer provides an adequate operational interpretation of probability.

(I would have to reread the article to learn again how they interpreted it. I am not saying that it has anything to do with the thermal intepretation, or with a subjectivist Bayesian interpretation. But I say that the specific interpretation for earthquake forecasts given in that article is well argued, and that it should be preferred over superficial attempts to force this situation into an ensemble interpretation.)

gentzen · Feb 12, 2022

gentzen said:

The intention of the above was to describe an "operational falsification" interpretation of probability in the hope that it would be easier to understand than the whole thermal interpretation. But even an interpretation of probability is hard to describe and involves many subtle issues, so in the end I often had to reference the thermal interpretation, because I wanted the description to stay reasonably short and avoid "my own original ideas".

I will try below to give a self-contained description of a "falsification based interpretation" of typical uses of probability. My intention is to follow the advice to "use your own words and speak in your own authority". Because "I prefer being concrete and understandable over being unobjectionable," this will include objectionable details that could be labeled as "my own original ideas".

In section A3.4 How meaningful are probabilities of single events? of A. Neumaier's theoretical physics FAQ it is questioned whether probability assignments to single events are meaningful:

Probability assignments to single events can be neither verified nor falsified. Indeed, suppose we intend to throw a coin exactly once.
Person A claims 'the probability of the coin coming out head is 50%'.
Person B claims 'the probability of the coin coming out head is 20%'.
Person C claims 'the probability of the coin coming out head is 80%'.
Now we throw the coin and find 'head'. Who was right? It is undecidable.

Thus there cannot be objective content in the statement 'the probability of the coin coming out head is p', when applied to a single case. Subjectively, of course, every person may feel (and is entitled to feel) right about their probability assignment. But for use in science, such a subjective view (where everyone is right, no matter which statement was made) is completely useless.
...
Thus probabilities are meaningful ... only as a property of the ensemble under consideration.

But the important question for a "falsification based interpretation" is "who was wrong?", not "who was right?". Now 20% is not wrong, and even 10% is not yet wrong. At 2,3% it starts to get a bit wrong, 0.1% would be even more wrong, but at some point degrees of wrongness are not definite enough anymore. A reasonable bound is at odds smaller than 1 : 3.5 million. The way to interpret this is that any odds smaller than that bound are just as wrong (including the claim that 'the coin coming out head is impossible'), but 1 : 3 million would be slightly less wrong. Those bounds (1/42 and 1/3500000) should not be interpreted as arbitrary or subjective, but as based on existing practice and past experience.

Even so it is nice to be able to say at least some words with respect to a single event, this by itself is not yet enough for an interpretation. Those word do not yet explain the claim that the thermal interpretation "also applies to single observations of macroscopic systems and not just to ensembles of similarly prepared systems". Such a "single observation" is typically a number that can be compared to an expectation value and its variance, but it doesn't directly correspond to a single event. A single system is yet another concept, as highlighted by the claim in section A3.5 Statistics of single systems that: "While probabilities of single events are meaningless, it is meaningful to do statistics on single systems, if the statistics of interest is that of the system's behavior in time."

We want to talk about probabilistic models, their states, about expectation values and correlations, about single systems, and about classes of similarly prepared systems. The starting point is

Callen’s criterion: Operationally, a system is in a given state if its properties are consistently described by the theory for this state.

The idea is to reinterpret frequentism as a way to reject the state of a model. We distinguish between the real system that provided the available observations, our idealized mathematical model of that system, and the given state. The state belongs to the mathematical model, because otherwise there would be no theory for the state. The intention is to try to limit "falsification" to the state. This could mean to exclude observations which seem to be caused by effects not included in the model, like cosmic rays, radioactive decay, or human errors of the operator making the observations. "The" real system could be a class of similarly prepared systems, it could be a system exhibiting stochastic features, or it could also be just some object with complicated features that we want to omit in our idealized model.

The absence of stochastic (and dynamic) features for this last case makes it well suited for clarifying the role of expectation values for the interpretation, and for highlighting the differences to ensemble interpretations.
As an example, we might want to describe the position and extent of Earth by the position and radius of a solid sphere. Because Earth is not a perfect sphere, there is no exactly correct radius. Therefore we use a simple probability distribution for the radius in the model, say a uniform distribution between a minimal and a maximal radius. We can compare our probabilistic model to the real system by a great variety of different observations, however only very few of them are "intended". (This is independent of the observations that are actually possible on the real system with reasonable effort.) At this point it really gets arbitrary or subjective in a way that cannot be fixed by appeals to existing practice and past experience.

Let us look at the indicator function of an object and at its Fourier transform. The indicator function is one if a point is inside the object, and zero otherwise. Finding a minimal and maximal radius and a position for the solid sphere such that the model won't be falsified by comparing the indicator function at arbitrary points should be possible, without even requiring an especially accurate position. However, comparing the Fourier transform of the indicator function for arbitrary spatial frequency vectors should falsify the model for all its possible states, because the smooth surface of the perfect sphere will lead to too small Fourier amplitudes for high spatial frequencies, compared to the rough surface of the earth. Correlations between the indicator function at two arbitrary points also will falsify the model. Take for example two points at the same distance from the center of the solid sphere such that one is inside and the other is outside of earth. This falsification could be avoided by also using a probability distribution for the position, however correlations between three points will always be able to falsify the model (for three points on a straight line where the outer points are inside and the inner point is outside of earth, which exist because Earth is not convex), and it would have been sort of cheating anyway. On the other hand, if using only the low spatial frequencies (compared to the radius) of the Fourier transform would falsify the model, then also using a probability distribution for the position would be fine, it would even be the "intended" fix in a certain way.

The above details of the role of the model and the related role of observation can give the impression that this interpretation is not a valid interpretation of probability. But probability is not just the abstract mathematical theory, but also related to how uncertainty is communicated and handled in actual practice. One reason why I wanted to write down something like the text above was that I had become uncertain about the exact role of the model, and I wanted to clarify it a bit before I had forgotten everything. My conclusion from the above is that observations also play an important role. That role is somewhat linked to the model, but not completely determined by it.

PeterDonis · Feb 12, 2022

gentzen said:

Now 20% is not wrong, and even 10% is not yet wrong. At 2,3% it starts to get a bit wrong, 0.1% would be even more wrong, but at some point degrees of wrongness are not definite enough anymore. A reasonable bound is at odds smaller than 1 : 3.5 million. The way to interpret this is that any odds smaller than that bound are just as wrong (including the claim that 'the coin coming out head is impossible'), but 1 : 3 million would be slightly less wrong. Those bounds (1/42 and 1/3500000) should not be interpreted as arbitrary or subjective, but as based on existing practice and past experience.

Where are you getting all this from? Why is 1 : 3.5 million a "reasonable bound"? What "existing practice and past experience" is this based on?

gentzen · Feb 12, 2022

PeterDonis said:

Where are you getting all this from? Why is 1 : 3.5 million a "reasonable bound"? What "existing practice and past experience" is this based on?

The 1 : 3.5 million are the odds corresponding to five-sigma according to many independent sources, for example here:

In short, five-sigma corresponds to a p-value, or probability, of 3x10-7, or about 1 in 3.5 million.

I also tried to find out the odds corresponding to Six Sigma, but surprisingly those were less extreme, namely only 3.4 : 1 million (because an empirically based 1.5 sigma shift is introduced into the calculation). So the 1 : 3.5 million odds turned out to be the most extreme measure used in practice.
The "past experience" refers to "war stories" particle physicists told in connection to the five-sigma bound, like the following:

A similar thing happened in 2011, when the CDF collaboration at Fermilab saw an excess of events at about 4 sigma. These were not fluctuations, but they required better understanding of the background.

The 1/42 bound and the 2.3% are related by 97.7 / 2.3 = 42.478 and come from my faulty memory of something Christian Kredler mentioned in his course on stochastics, namely the question on the minimal sample size allowing reasonable statistics. He said something like that "we would probably still have 3-4 categories, so 20 would be too small. Something between 30 and 40 could make sense, perhaps a bit more. Don't dismiss this, you might believe now that you will always work with large sample sizes, say >200, but when you have to do the interviews, it will consume time and money. The temptation to do only very few will be high."

PeterDonis · Feb 12, 2022

gentzen said:

The 1 : 3.5 million are the odds corresponding to five-sigma

Ok.

gentzen said:

I also tried to find out the odds corresponding to Six Sigma, but surprisingly those were less extreme

That's because "six sigma" is a misnomer, at least if you're looking at one-tailed probabilities (i.e., deviations to just one side of the distribution). In the terminology your using, it would be only three sigma--three to each side of the distribution, or six total.

gentzen said:

The "past experience" refers to "war stories" particle physicists told in connection to the five-sigma bound

In other words, you're referring to a bound that particle physicists have adopted because it seems to be validated by experience with which apparent discoveries turn out to be "real" and which ones turn out to be statistical flukes. Correct?

gentzen · Feb 13, 2022

PeterDonis said:

In other words, you're referring to a bound that particle physicists have adopted because it seems to be validated by experience with which apparent discoveries turn out to be "real" and which ones turn out to be statistical flukes. Correct?

I am referring to the most extreme bound where I could be convinced that it is actually practiced beyond mere lip service. And where it is practiced despite costing significant amounts of time and money. Where the lip service revolves around why it is actually required, where the bound is used by the community itself for their decisions, and not just used to mislead external actors.

Fra · Feb 15, 2022

As gentzen quoted this from what Neumaier wrote:

"Now we throw the coin and find 'head'. Who was right? It is undecidable. Thus there cannot be objective content in the statement 'the probability of the coin coming out head is p', when applied to a single case. Subjectively, of course, every person may feel (and is entitled to feel) right about their probability assignment. But for use in science, such a subjective view (where everyone is right, no matter which statement was made) is completely useless."

I agree that wether and agent(to use my preferred terms) is right or wrong is not decidable, and I would even argue it does not even matter!

What matters is the action of the agent, and it's dependence on the agents expectations. The causal relation between this action and the expectation, must be independent of the it's absolue truth value, as it's relative. And thus causal relation is what should be "tested". Even if the weather forecast does not determine tormorrow weather, it will (to a larger extent, even it not perfectly) determine how we act tomorrow. Evolution will influence the steady state population of agents, but as this will imply that agents learn and evolve, the "falsification" events with correspond to death of agents with certain inference traits etc, and those that stick around are "corroborated". The final objective scientific value would then be about wether this can add explanatory value or not to interactions.

I think what one may rightfully object to, is the vauge link between this and physical matter systems. Ie. how can a elementary system having only basic properties like mass, charge etc "implement" such a game theoretic intearactions unless one pictures particles as having brains? Is there a way to make this information theoretic gambling puzzle work out? That is of course the big open challenge, and it's why this IMO is inseparable from the quest for unification of forces.

/Fredrik

Probability via Expectation and Callen's criterion

FAQ: Probability via Expectation and Callen's criterion

What is probability via expectation?

How is probability via expectation calculated?

What is Callen's criterion?

How is Callen's criterion used in probability via expectation?

What are some applications of probability via expectation and Callen's criterion?

Similar threads

Hot Threads

Recent Insights