- #1
Killtech
- 344
- 35
- TL;DR Summary
- How do people get the idea Bell's inequalities invalidate probability theory?
For some reason on this forums here I found some weird opinions that Bell's inequality were somehow disqualifying classical probability theory in general rather then showing merely the limits of locality. I do no understand where that misunderstanding comes from, so I decided to find out.
Let's start with looking at Bell's formulation of the problem formulated with classic probability theory. While his choice of probability space is generic, his random variables on the other hand are very special. The concept of locality is translated into what those random variables may or may not depend on and this constrain is crucial in proving the theorem of the classical limit. In literature this condition on probability is called Bell locality, or factorizability and shared by all inequalities of this type. Conversely losing that constrain breaks the inequality since probability theory itself knows no such limits.
It is also very enlightening to check the counter example which shows how QT violates the inequality, e.g. for the CHSH case. It is a very easy derivation describing how the correlations come to be... so let's check what happens there from a probability theory perspective. And indeed it's pretty standard stuff. Probabilities are calculated from an object of the generic form ##|\langle a|i\rangle|^2## and in order for this to produce valid probabilities this must be part of a stochastic matrix - one of the classics of probability theory (if we expand the ##|i\rangle## states into a basis). These are normally introduced for Markov chains and stochastic processes. So this implicitly defined discrete time Markov process is used to model the measurements in CHSH and is indeed all it needs to violate the inequality. It is important to note the crucial difference to the classic limit is that here the process isn't local and practically allows the settings of both detectors to communicate with the underlying state during the transition - i.e. it bluntly ignores Bell's locality.
It may be noteworthy to mention that in this case the stochastic matrix is still restricted by the rules of QT, while for a general Markov process any stochastic matrix is valid. So without QT we can see that Markovs theory is not only able to violate Bell's inequality but can do so maximally well beyond Tsirelson's bound.
The matrix taken as it is however, is a little untypical. It does not transition from the entire ensemble space onto itself (like one would expect it to for a usual Markov process) but instead takes only a discrete subset of the state space - only ensembles composed of the states |i⟩ and no superpositions thereof) and transitions these onto ensembles composed of |a⟩ states. However QT guarantees that post measurement only eigenstates of the observable operator survive while the initial basis can be chosen at will and so we have a stochastic matrix for every choice. Both together uniquely specify the full Markov kernel (of which the stochastic matrix is just an extract) of a process transitioning any initial ensemble onto the ensembles after measurement (the entire state space is too large to allow a matrix depiction) .
Anyhow, that process is peculiar in that it reaches detailed balance (equilibrium) after a single step: a second application does not change the resulting ensemble any further - so it looks like the limit of some underlying process. Generally it is indeed very unusual to express measurement via a process as opposed to a simple random variable (like Bell did). The latter are always compatible with each other, while the prior usually are not - i.e. different processes transitioning the very same system and its state normally don't commute.
So, like how do people get the idea that any of this invalidates classic probability theory?
Let's start with looking at Bell's formulation of the problem formulated with classic probability theory. While his choice of probability space is generic, his random variables on the other hand are very special. The concept of locality is translated into what those random variables may or may not depend on and this constrain is crucial in proving the theorem of the classical limit. In literature this condition on probability is called Bell locality, or factorizability and shared by all inequalities of this type. Conversely losing that constrain breaks the inequality since probability theory itself knows no such limits.
It is also very enlightening to check the counter example which shows how QT violates the inequality, e.g. for the CHSH case. It is a very easy derivation describing how the correlations come to be... so let's check what happens there from a probability theory perspective. And indeed it's pretty standard stuff. Probabilities are calculated from an object of the generic form ##|\langle a|i\rangle|^2## and in order for this to produce valid probabilities this must be part of a stochastic matrix - one of the classics of probability theory (if we expand the ##|i\rangle## states into a basis). These are normally introduced for Markov chains and stochastic processes. So this implicitly defined discrete time Markov process is used to model the measurements in CHSH and is indeed all it needs to violate the inequality. It is important to note the crucial difference to the classic limit is that here the process isn't local and practically allows the settings of both detectors to communicate with the underlying state during the transition - i.e. it bluntly ignores Bell's locality.
It may be noteworthy to mention that in this case the stochastic matrix is still restricted by the rules of QT, while for a general Markov process any stochastic matrix is valid. So without QT we can see that Markovs theory is not only able to violate Bell's inequality but can do so maximally well beyond Tsirelson's bound.
The matrix taken as it is however, is a little untypical. It does not transition from the entire ensemble space onto itself (like one would expect it to for a usual Markov process) but instead takes only a discrete subset of the state space - only ensembles composed of the states |i⟩ and no superpositions thereof) and transitions these onto ensembles composed of |a⟩ states. However QT guarantees that post measurement only eigenstates of the observable operator survive while the initial basis can be chosen at will and so we have a stochastic matrix for every choice. Both together uniquely specify the full Markov kernel (of which the stochastic matrix is just an extract) of a process transitioning any initial ensemble onto the ensembles after measurement (the entire state space is too large to allow a matrix depiction) .
Anyhow, that process is peculiar in that it reaches detailed balance (equilibrium) after a single step: a second application does not change the resulting ensemble any further - so it looks like the limit of some underlying process. Generally it is indeed very unusual to express measurement via a process as opposed to a simple random variable (like Bell did). The latter are always compatible with each other, while the prior usually are not - i.e. different processes transitioning the very same system and its state normally don't commute.
So, like how do people get the idea that any of this invalidates classic probability theory?