Does Busch's Theorem Offer a Simplified Proof of Gleason's Theorem?

naima · Aug 10, 2014

Effects are non negative linear operators below identity. they map the unit sphere inside it. so effects are compact operators: ##\mathcal{E}(H) \subset \mathcal{K}(H)##
wikipedia tells us that the dual of compact operators is the set of trace class operators: ##\mathcal{K}^*(H) = C_1(H)##
there is a bijective linear map from trace class operator ##\rho## to functional v over ##\mathcal{K}(H)## defined by v(K) =Tr(##\rho## K)
read the proof here
Busch theorem says that if v is a functional over effects such that
v(E) ##ìn## [0 1]
v(Id) = 1
v is ##\sigma##-additive
then ##\rho## is an hermitian non negative trace class operator below Id with Tr(##\rho##] = 1.

This is because:
Tr(##\rho##] = Tr(##\rho Id##] = v(Id) = 1
non negativity below id: take a unit vector v ##\langle v|\rho v \rangle = Tr(|v><v|\rho) = v( |v><v|) \in [0 1]## (projectors are effects)

But i do not see why ##\rho## has to be hermitian.

Fredrik · Aug 10, 2014

All positive operators are self-adjoint. Self-ajdoint operators are normal operators with a spectrum that's a subset of ##\mathbb R##. Positive operators are normal operators with a spectrum that's a subset of ##[0,\infty)##. Effects are normal operators with a spectrum that's a subset of ##[0,1]##. Projections are normal operators with a spectrum that's a subset of ##\{0,1\}##. Since ##\{0,1\}\subseteq[0,1]\subseteq[0,\infty)\subseteq\mathbb R##, we have $$\text{Projections}\subseteq \text{Effects}\subseteq\text{Positive operators}\subseteq\text{Self-adjoint operators}$$

naima · Aug 10, 2014

Thank you Fredrik.

It seems that this gives the missing end of the proof in Busch's paper.

Fredrik · Aug 31, 2014

Sorry about abandoning this thread for so long. I have studied some C*-algebra stuff, very slowly, so it took me a lot of time, and I've been distracted by a bunch of other stuff as well. I have started to prepare a series of posts where I present my thoughts on Busch's theorem and proof. I thought I'd break it up in three parts:

1. Elementary stuff about C*-algebras.
2. Busch's definition of "probability measure".
3. Statement and proof of the theorem.

I understand these things much better now than when I started the thread, but I still don't fully understand the theorem or its proof, so writing part 3 is a bit of a problem. I would like to discuss the main issue here.

The only thing that Busch proves in his article is that a probability measure on ##\mathcal E(\mathcal H)## (the set of effects) can be extended to a positive linear functional on the set of self-adjoint bounded linear operators. Then he says that it follows from the σ-additivity that this functional is normal, and that it's "well known" that "any such functional is obtained from a density operator". So to reach Busch's conclusion, there are two things we must prove:

(i) ##\nu## is normal.
(ii) That last claim about a density operator.

Micromass suggested that we rely on theorem 46.4 in Conway ("A course in operator theory", p. 259) for part (ii). Unfortunately I haven't made it that far in my studies yet. I think I will read the chapter on von Neumann algebras (and maybe the whole book) from the beginning, but I haven't done that yet, so I don't fully understand the theorem. This is what it says:

Theorem: (Conway 46.4). Let M be a von Neuman algebra. Let ##\psi## be a positive linear functional on M. The following statements are equivalent.

(a) ##\psi## is normal.
(b) If ##\{E_i|i\in I\}## is a pairwise orthogonal family of projections in M, then ##\psi\big(\sum_{i\in I}E_i\big)=\sum_{i\in I}\psi(E_i)##.
(c) ##\psi## is weak*-continuous.
(d) There's a positive trace class operator C such that ##\psi(A)=\operatorname{Tr}(AC)## for all ##A\in M##.

This a theorem about von Neumann algebras, so we will need the definitions of positive and normal linear functionals on von Neumann algebras. We will also need to figure out what von Neumann algebra to apply the theorem to. It seems like it should be the set of self-adjoint elements of ##\mathcal B(\mathcal H)##, but I haven't even verified that it's a von Neumann algebra yet. The set of effects is not a von Neumann algebra. (It's not closed under addition, so it's not even a vector space). That's why the probability measure must be extended to a larger set before we can apply Conway 46.4.

A linear map between von Neumann algebras is said to be positive if it takes positive elements to positive elements. Since ##\mathbb C## is a 1-dimensional Hilbert space that can be identified with the von Neumann algebra ##\mathcal B(\mathbb C)##, this definition also tells us what a positive linear functional on a von Neumann algebra is. It's a linear map ##\phi:M\to\mathbb C## (where M is a von Neumann algebra) such that ##\phi(A)\geq 0## for all positive A in M.

Let M and N be von Neumann algebras. A positive linear map ##\phi:M\to N## is said to be normal if ##\phi(A_i)\to\phi(A)## SOT, for all increasing nets ##(A_i)## in M such that ##A_i\to A## SOT. (SOT = with respect to the strong operator topology).

Convergence in the strong operator topology is just pointwise convergence. So ##A_i\to A## SOT means that ##A_ix\to Ax## for all ##x\in\mathcal H##. For sequences of complex numbers, this is equivalent to the usual convergence with respect to the metric. So now we know what a normal linear functional on a von Neumann algebra is. It's a linear map ##\phi:M\to\mathbb C## that takes positive elements to positive elements, and is such that ##\phi(A_i)\to \phi(A)## for all increasing nets ##(A_i)_{i\in I}## in M such that ##A_i\to A## SOT.

The above is Conway's definition. Blackadar's definition is slightly different. Let M be a von Neumann algebra. A (bounded) linear functional ##\phi:M\to\mathbb C## is said to be normal if for all bounded increasing nets ##(A_i)_{i\in I}## in ##M_+##, we have ##\phi(A_i)\to\phi(A)##, where ##A=\sup_{i\in I} A_i##.

This isn't a verbatim quote, but he did put the word "bounded" in parenthesis like that. Not sure what it means. I don't know if the definitions are equivalent, but I haven't given it any serious thought yet. It looks suspicious that Blackadar talks about sequences in ##M_+## (the positive elements) while Conway talks about sequences in ##M##.

Is the ##\nu## in Busch's article normal? Busch says that it follows from σ-additivity. My first thought is that the sequence of partial sums must be an increasing net that converges to the sum. Ah, I think I get it. The sum of two effects is not always an effect, but effects are positive, and the sum of two positive operators is positive. So the sequence of partial sums is an increasing net of positive operators, and by assumption it converges SOT to an effect. If we denote the sum by ##E## and the nth partial sum by ##S_n##, σ-additivity, i.e. the assumption ##\nu\big(\sum_{i=1}^\infty E_i\big)=\sum_{i=1}^\infty \nu(E_i)##, is telling us that ##\nu(E)=\lim_n\nu(S_n)##. So ##\nu## is normal.

The claim that "any such functional is obtained from a density operator" should mean that there's a positive linear operator ##\rho## such that ##\operatorname{Tr}\rho=1## and ##\nu(E)=\operatorname{Tr}(\rho E)## for all self-adjoint bounded linear operators E. We just found that the ##\nu## in Busch's article satisfies statement (a) in Conway 46.4, and Busch's conclusion is essentially statement (d). The only thing missing is the result ##\operatorname{Tr}\rho=1##. This follows from the assumption ##\nu(I)=1##.
$$1=\nu(I)=\operatorname{Tr}(\rho I)=\operatorname{Tr}\rho.$$ OK, this turned out better than I thought when I started writing it. I haven't proved that the set of self-adjoint linear operators is a von Neumann algebra, and I'm a bit confused by the fact that Blackadar's definition of "normal" is slightly different, but everything else turned out OK.

micromass · Aug 31, 2014

Fredrik said:

I haven't proved that the set of self-adjoint linear operators is a von Neumann algebra,

It's not. A von Neumann algebra is a special kind of C*-algebra. In particular it must be a ##\mathbb{C}##-vector space. So the self-adjoint linear operators are not a von Neumann algebra.
The von Neumann algebra you should consider is ##\mathcal{B}(\mathcal{H})##.

and I'm a bit confused by the fact that Blackadar's definition of "normal" is slightly different, but everything else turned out OK.

Yes, but the idea is that an increasing net ##(A_i)_{i\in I}## converges if and only if it is bounded. And in that case, the net converges to ##\mathrm{sup}_{i\in I} A_i##. This is much like the situation of sequences in ##\mathbb{R}##, where an increasing sequence converges iff it is bounded.

Fredrik · Aug 31, 2014

Elementary stuff about C*-algebras.

Fredrik said:

I have started to prepare a series of posts where I present my thoughts on Busch's theorem and proof. I thought I'd break it up in three parts:

1. Elementary stuff about C*-algebras.
2. Busch's definition of "probability measure".
3. Statement and proof of the theorem.

I wrote parts 1 and 2 before the post above, so I might as well post them. This is part 1. These are all things I needed to work out to understand some detail in Busch's proof.

In this post, ##\mathcal A## is an arbitrary unital C*-algebra, and ##\mathcal H## is an arbitrary Hilbert space. The set ##\mathcal B(\mathcal H)## of bounded linear operators on ##\mathcal H## is a unital C*-algebra.

Theorem: (I edited this statement after seeing micromass' comment in post #43).
Let x be an arbitrary normal element of ##\mathcal A##. The following statements are equivalent.

(a) ##\sigma(x)\subseteq[0,1]##.
(b) ##0\leq x\leq 1##.
(c) ##0\leq 1-x\leq 1##.

The proof is easy if you're familiar with the spectral mapping theorem. For example, the implication (a) ##\Rightarrow## (b) is proved like this:
\begin{align}
&\sigma(x)\subseteq[0,1]\subseteq[0,\infty)\ \Rightarrow\ x\geq 0.\\
&\sigma(1-x) =\{1-\lambda|\lambda\in\sigma(x)\} \subseteq\{1-\lambda|\lambda\in[0,1]\} = [0,1]\subseteq[0,\infty)\ \Rightarrow\ 1-x\geq 0.
\end{align}
Definition: An element ##x\in\mathcal A## is said to be an effect if it satisfies the equivalent conditions above.

Note that x is an effect if and only if 1-x is an effect. I will use the notation ##\mathcal E(\mathcal A)## for the set of effects in ##\mathcal A##, and the notation ##\mathcal E(\mathcal H)## for the set of effects in ##\mathcal B(\mathcal H)##.

Theorem:

(a) For all ##x\in\mathcal E(\mathcal A)## and all ##k\in[0,1]##, we have ##kx\in\mathcal E (\mathcal A)##.
(b) For all ##x,y\in\mathcal E(\mathcal A)##, if ##x\leq y## then ##y-x\in\mathcal E(\mathcal A)##

Proof:

(a): ##\sigma(kx)=\{k\lambda|\lambda\in\sigma(x)\} \subseteq\{k\lambda|\lambda\in[0,1]\} = [0,k]\subseteq[0,1]##.

(b):Let x and y be arbitrary effects such that ##x\leq y##. We have ##y-x\geq 0##. We will prove that ##y-x\leq 1##.
$$0\leq 1-y\leq (1-x)-(y-x)\ \Rightarrow\ y-x\leq 1-x\leq 1.$$ It's not obvious that ≤ is transitive, but it's easy to prove.

Theorem: Every self-adjoint element can be written as a difference of two positive elements.

I've been denoting elements of ##\mathcal A## by x and y, but I will prove the less fancy version of this theorem, where the C* algebra is specifically ##\mathcal B(\mathcal H)##. So I will denote a typical element by A rather than x. (The fancy version for arbitrary unital C*-algebras requires Gelfand-Naimark theory. I typed up that proof before I came across this one, so I can post it if anyone is interested).

Proof: Suppose that A is self-adjoint. We will prove that ##\|A\|-A## and ##\|A\|+A## are positive. It follows from the spectral mapping theorem that B/2 is positive for all positive B, so if we can prove that claim I just made, we can write A as a difference of positive elements in the following way:
$$A=\frac{\|A\|+A}{2}-\frac{\|A\|-A}{2}.$$ Here's the proof that those numerators are positive: For all ##x\in\mathcal H##, we have
$$\langle x,(\|A\|\pm A)x\rangle =\|A\|\|x\|^2\pm\langle x,Ax\rangle \geq \|A\|\|x\|^2-\underbrace{|\langle x,Ax\rangle|}_{\leq\|x\|\|A\|\|x\|} \geq 0.$$ Note that the first inequality makes sense because the assumption that A is self-adjoint implies that ##\langle x,Ax\rangle## is a real number.

micromass · Aug 31, 2014

Fredrik said:

Theorem: (Conway 46.4). Let M be a von Neuman algebra. Let ##\psi## be a positive linear functional on M. The following statements are equivalent.

(a) ##\psi## is normal.
(b) If ##\{E_i|i\in I\}## is a pairwise orthogonal family of projections in M, then ##\psi\big(\sum_{i\in I}E_i\big)=\sum_{i\in I}\psi(E_i)##.
(c) ##\psi## is weak*-continuous.
(d) There's a positive trace class operator C such that ##\psi(A)=\operatorname{Tr}(AC)## for all ##A\in M##.

Also, check out Theorem 54.10 in Conway for a more general version. Note that a positive linear functional is always bounded (Theorem 7.3 in Conway), so it really is a more general version.

micromass · Aug 31, 2014

Fredrik said:

Theorem: Let ##x\in\mathcal A## be arbitrary. The following statements are equivalent.

(a) ##\sigma(x)\subseteq[0,1]##.
(b) ##0\leq x\leq 1##.
(c) ##0\leq 1-x\leq 1##.

You need that ##x## is normal in this theorem.

Fredrik · Aug 31, 2014

micromass said:

It's not. A von Neumann algebra is a special kind of C*-algebra. In particular it must be a ##\mathbb{C}##-vector space. So the self-adjoint linear operators are not a von Neumann algebra.

Ah of course, it's not closed under (complex) scalar multiplication. If A is self-adjoint, iA is not.

micromass said:

The von Neumann algebra you should consider is ##\mathcal{B}(\mathcal{H})##.

OK that makes sense. I was confused by the fact that Busch only extended his ##\nu## to the set of self-adjoint elements. But we can extended it further with these tricks:
\begin{align}
&A=\frac{A+A^*}{2}+i\frac{A-A^*}{2i}.\\
&\nu(A)=\nu\left(\frac{A+A^*}{2}\right)+i\nu\left(\frac{A-A^*}{2i}\right)
\end{align}

micromass said:

Yes, but the idea is that an increasing net ##(A_i)_{i\in I}## converges if and only if it is bounded. And in that case, the net converges to ##\mathrm{sup}_{i\in I} A_i##. This is much like the situation of sequences in ##\mathbb{R}##, where an increasing sequence converges iff it is bounded.

Yes, I thought it might be essentially the same idea, and I think I saw a theorem about it at the start of the von Neumann chapter in Conway, but I haven't studied it yet. I will have to do that, or at least take some time to think about how the least upper bound thing works here.

Fredrik · Aug 31, 2014

Busch's definition of "probability measure".

I took a break to eat some sandwiches and watch the pilot of Outlander. (Not bad, I will have to watch at least one more). These are the comments I wrote earlier about Busch's definition of "probability measure".Busch's definition of probability measure looks different from definitions I've seen in other places, e.g. Varadarajan ("Geometry of quantum theory", p. 50). So a discussion about that is in order. It's useful to know the following theorem.

Theorem: Let ##\{M_i|i\in I\}## be a pairwise orthogonal family of closed linear subspaces of ##\mathcal H##. For each ##i\in I##, let ##P_i## be the projection onto ##M_i##. Let M be the closure of the linear subspace spanned by ##\bigcup_{i\in I}M_i##. Let ##P## be the projection onto M. We have ##\sum_{i\in I}P_i=P##.

I'm not going to prove this here (unless somone asks). But I'll add a couple of comments about that sum. Let ##\mathcal F## be the set of finite subsets of ##I##. This set is partially ordered by inclusion. For each ##F\in\mathcal F##, define ##P_F=\sum_{i\in F}P_i##. The map ##F\mapsto P_F## with domain ##\mathcal F## is a net. ##\sum_{i\in I}P_i## is defined as the limit of this net in the strong operator topology.

The strong operator topology (SOT) sounds fancy, but when you apply the definition, you see that SOT convergence is just pointwise convergence. ##A_i\to A## SOT if and only if ##A_ix\to Ax## for all ##x\in\mathcal H##.

The following definition is equivalent to Varadarajan's.

Definition: Let ##\mathcal L## be the set of closed linear subspaces of ##\mathcal H##. A function ##\mu:\mathcal L\to[0,1]## is said to be a probability measure if the following statements are true.
(a) ##\mu(\{0\})=0##, ##\mu(\mathcal H)=1##.
(b) If ##(M_i)_{i=1}^\infty## is a pairwise orthogonal sequence in ##\mathcal L##, and M is the closure of the linear subspace spanned by ##\bigcup_{i=1}^\infty M_i##, then ##\mu(M)=\sum_{i=1}^\infty \mu(M_i)##.

Because of the bijective correspondence between closed linear subspaces of ##\mathcal H## and projections in ##\mathcal B(\mathcal H)##, this can be translated into a definition of a probability measure on the set ##\mathcal P(\mathcal H)## of projections in ##\mathcal B(\mathcal H)##.

Definition: A function ##\mu:\mathcal P(\mathcal H)\to[0,1]## is said to be a probability measure if the following statements are true.
(a) ##\mu(0)=0##, ##\mu(I)=1##.
(b) If ##(P_i)_{i=1}^\infty## is a pairwise orthogonal sequence in ##\mathcal P(\mathcal H)##, then ##\mu\big(\sum_{i=1}^\infty P_i\big)=\sum_{i=1}^\infty \mu(P_i)##.

If you have seen how Busch defines a probability measure on the set of effects, you might expect a condition like ##P_1+P_2+\dots\leq I## to appear here. We need to know why there's no such condition here. The answer is that ##\sum_{i=1}^\infty P_i## always converges to a projection ##P## and every projection satisfies ##P\leq I##. (This is equivalent to ##I-P\geq 0##, which is equivalent to ##\sigma(I-P)\subseteq [0,\infty)##, and this holds because ##I-P## is a projection too, so we actually have ##\sigma(I-P)\subseteq\{0,1\}##). So the "Busch-like" condition ##P_1+P_2+\dots\leq I## holds for all pairwise orthogonal sequences of projections ##(P_i)_{i=1}^\infty## because the corresponding sequence of partial sums always converges to a projection.

To generalize this to the set ##\mathcal E(\mathcal H)## of effects, we have to replace the orthogonality condition with something that means the same thing for projections, but can also be stated as a condition on effects. I suggest the following:

(b') If ##(P_i)_{i=1}^\infty## is a sequence in ##\mathcal P(\mathcal H)## such that its sequence of partial sums converges to an element of ##\mathcal P(H)##, then ##\mu\big(\sum_{i=1}^\infty P_i\big)=\sum_{i=1}^\infty \mu(P_i)##.

This gives us a way to generalize the definition:

Definition: A function ##\mu:\mathcal E(\mathcal H)\to[0,1]## is said to be a probability measure if the following statements are true.
(a) ##\mu(0)=0##, ##\mu(I)=1##.
(b) If ##(E_i)_{i=1}^\infty## is a sequence in ##\mathcal E(\mathcal H)## such that its sequence of partial sums converges to an element of ##\mathcal E(H)##, then ##\mu\big(\sum_{i=1}^\infty E_i\big)=\sum_{i=1}^\infty \mu(E_i)##.

I prefer this to Busch's version. It took me a lot of time to see the significance of his condition ##E_1+E_2+\cdots \leq I##. This holds automatically when we use my version. His condition implies that the sequence of partial sums converges (in the sense that the condition only makes sense if that's the case), and says that the limit is less than or equal to I. If he had said that the limit, let's call it E, satisfies ##0\leq E\leq I##, then his condition is certainly equivalent to mine. Perhaps there's no need to mention that ##E\geq 0##. I haven't tried it, but perhaps we can prove that the limit of a convergent sequence of positive operators is positive. But then, wouldn't we also be able to prove that the limit of a convergent sequence of effects is an effect? If that's the case, then Busch's condition serves no real purpose.

OK, here's a thought that I got just now. What if we assume not only that ##E\leq I##, but also that each partial sum is ##\leq I##? A finite sum of effects may not be an effect, but it's always a positive operator, so this condition would imply that each partial sum is an effect. That sounds like something we might want to include in the definition. On the other hand, it also seems likely that if the sequence of partial sums converges to an effect, then each partial sum will be an effect anyway. (I haven't really thought that through). That would make the condition superfluous.

Edit: I was going to type up my version of Busch's argument for why the extension is possible, but I will skip that. It's not too hard to follow his argument if you know the elementary results mentioned in my previous posts. If anyone has a question about a detail, ask it. I think I will be able to answer it.

Fredrik · Aug 31, 2014

micromass said:

Also, check out Theorem 54.10 in Conway for a more general version. Note that a positive linear functional is always bounded (Theorem 7.3 in Conway), so it really is a more general version.

Interesting. If we use the more general theorem, we can skip the step where we prove that Busch's ##\nu## is normal.

I saw a nice proof of the fact that positive linear functionals are bounded in one of the other books, but now I can't find it. I'd better write it down here before I forget.

It's based on the fact that positive linear functions preserve order (##A\leq B\Rightarrow\phi(A)\leq\phi(B)##) and that ##\|A\|\pm A## is positive when A is self-adjoint. So for all self-adjoint A,
\begin{align}
&-\|A\|\leq A\leq\|A\|\\
&-\|A\|\phi(1)\leq\phi(A)\leq\|A\|\phi(1)\\
&|\phi(A)|\leq\|A\||\phi(1)|
\end{align} This implies that for all A, (let B and C be the self-adjoint operators such that A=B+iC)
$$|\phi(A)|=|\phi(B)+i\phi(C)| \leq|\phi(B)|+|\phi(C)| =\|B\||\phi(1)|+\|C\||\phi(1)| =\big(\|B\|+\|C\|\big)|\phi(1)| \leq 2|\phi(1)|\|A\|.$$ That last inequality above follows from the following result for B and the similar result for C:
$$\|B\|=\left\|\frac{A+A^*}{2}\right\| \leq\frac 1 2 \big(\|A\|+\|A^*\|\big) =\|A\|.$$

Fredrik · Sep 1, 2014

bhobba said:

OK guys here is the proof I came up with.
...
Now its out there we can pull it to pieces and see exactly what's going on.

I meant to get started on this a long time ago, but I've been distracted. Better late than never I suppose.

bhobba said:

Just for completeness let's define a POVM. A POVM is a set of positive operators Ei ∑ Ei =1 from, for the purposes of QM, an assumed complex vector space.

This is the definition that's appropriate for a finite-dimensional space, right? So is your entire proof for finite-dimensional spaces?

bhobba said:

Elements of POVM's are called effects and its easy to see a positive operator E is an effect iff Trace(E) <= 1.

I don't see how the trace enters the picture.

Is the start of that sentence your definition of "effect"?

bhobba said:

First let's start with the foundational axiom the proof uses as its starting point.

I would prefer if you could make a full statement of the theorem. You should at least explain where you're going with this.

bhobba said:

An observation/measurement with possible outcomes i = 1, 2, 3 ... is described by a POVM Ei such that the probability of outcome i is determined by Ei, and only by Ei, in particular it does not depend on what POVM it is part of.

OK, this suggests that there's a function that takes effects to numbers in the interval [0,1], and that this function should have properties similar to those of a probability measure on a σ-algebra.

bhobba said:

I will let f(Ei) be the probability of Ei. Obviously f(I) = 1 from the law of total probability. Since I + 0 = I f(0) = 0.

This is not so obvious unless you know exactly what the 0 operator and the identity operator represent: Yes-no measurements that always give you the answer "no" and "yes" respectively.

If you want to make this argument without using the "theorem + proof" structure, you should explain such things. If you want to make it in the form of a theorem, you should state the theorem in a way that includes a definition of a probability measure on the set of effects.

bhobba said:

First additivity of the measure for effects.

Let E1 + E2 = E3 where E1, E2 and E3 are all effects. Then there exists an effect E E1 + E2 + E = E3 + E = I. Hence f(E1) + f(E2) = f(E3)

By your definitions, if ##E_1## and ##E_2## are effects, there exist positive operators ##F_i## and ##G_j## (for each i in some set I, and each j in some set J) such that ##\sum_i F_i=I=\sum_j G_j##, and indices i and j such that ##F_i=E_1## and ##G_j=E_2##. But why should ##F_i+G_j## be an effect?

The set of effects as defined in my post #41 is not closed under addition. If your definition (or its generalization to spaces that may be infinite-dimensional) is equivalent to mine, then you can't assume that ##E_3## is an effect.

bhobba said:

Next linearity wrt the rationals - its the usual standard argument from additivity from linear algebra but will repeat it anyway.

f(E) = f(n E/n) = f(E/n + ... + E/n) = n f(E/n) or 1/n f(E) = f(E/n). f(m E/n) = f(E/n + ... E/n) or m/n f(E) = f(m/n E) if m <= n to ensure we are dealing with effects.

OK. My version: For all ##n\in\mathbb Z^+## (that's positive integers), we have
$$f(E)=f\left(n\frac{E}{n}\right) =nf\left(\frac{E}{n}\right),$$ and therefore $$f\left(\frac{E}{n}\right)=\frac{1}{n}f(E).$$
This implies that for all ##n,m\in\mathbb Z^+## such that ##n\geq m##, we have
$$f\left(\frac m n E\right)=m f\left(\frac 1 n E\right)=\frac m n f(E).$$ You should probably mention that this argument relies on a theorem that says that if E is an effect and ##\lambda\in[0,1]##, then λE is an effect.

bhobba said:

If E is a positive operator a n and an effect E1 exists E = n E1 as easily seen by the fact effects are positive operators with trace <= 1.

It took me several minutes to understand this sentence. It's very strangely worded. How about something like this instead: For each positive operator E, there's an effect ##E_1## and a positive integer n such that ##E=nE_1##.

bhobba said:

f(E) is defined as nf(E1). To show well defined suppose nE1 = mE2. n/n+m E1 = m/n+m E2. f(n/n+m E1) = f(m/n+m E1). n/n+m f(E1) = m/n+m f(E2) so nf(E1) = mf(E2).

I don't understand what you're doing. Did you mean multiplication when you wrote +? Are there parentheses missing or something. You really should start using LaTeX.

My version: The assumption implies that ##E_1=mE_2/n##. So we have
$$nf(E_1)=nf\left(\frac m n E_2\right) =n\frac m n f(E_2) =mf(E_2).$$

bhobba said:

From the definition its easy to see for any positive operators E1, E2 f(E1 + E2) = f(E1) + f(E2).

It doesn't follow from the definition. We have to do something like this: ##E_1=nE_1'## and ##E_2=mE_2'##, where ##E_1'## and ##E_2'## are effects such that ##E_1'+E_2'## is an effect. (This can be accomplished by choosing m and n large). If ##n\geq m##, we have
\begin{align}
&f(E_1+E_2)=f(nE_1'+mE_2') =f\left(n\left(E_1'+\frac m n E_2'\right)\right) =n f\left(E_1'+\frac m n E_2'\right) = n\left( f(E_1')+\frac{m}{n}f(E_2')\right)\\
& nf(E_1')+mf(E_2') =f(E_1)+f(E_2).
\end{align}

bhobba said:

Then similar to effects show for any rational m/n f(m/n E) = m/n f(E).

If you had shown that for all effects E and all ##p\in[0,1]##, we have ##f(pE)=pf(E)##, you wouldn't have had to do the thing with rational numbers twice. By the way, a comma after m/n would make that sentence more readable. A comma followed by words like "we have" would be even better, because a comma sometimes means "and".

I'm going to take a break here, and do the rest later.

bhobba · Sep 1, 2014

Fredrik said:

I meant to get started on this a long time ago, but I've been distracted. Better late than never I suppose.

Mate - this is tricky stuff that will take a while to fully sort out.

Only doing it when the mood strikes, and time permits, is what I fully expect.

Also you are using the language of rigorous analysis. I certainly studied such and spoke that language once upon a daydream but tend to shy away from it these days.

Fredrik said:

This is the definition that's appropriate for a finite-dimensional space, right? So is your entire proof for finite-dimensional spaces?

For simplicity I will restrict myself to finite dimensional spaces.

My personal view of QM is always to do foundational issues in finite spaces then extend it via Rigged Hilbert Spaces.

Fredrik said:

I don't see how the trace enters the picture.

By definition an effect is an element of some POVM.

Given any effect E you have another effect U (it may be zero) such that E+U = 1 or E = 1-U. Take the trace of both sides and since E and U are positive Trace E <= 1. Conversely, suppose E is a positive operator Trace E <= 1. Let U = 1-E then Trace U <=1 and is positive so E and U are positive operators U + E = 1 - hence E is an effect.

Fredrik said:

I would prefer if you could make a full statement of the theorem. You should at least explain where you're going with this.

I chose this method because it seems more direct - eg there is no question of showing the resultant formula is a probability because I have already hypothesised it to be.

Fredrik said:

OK, this suggests that there's a function that takes effects to numbers in the interval [0,1], and that this function should have properties similar to those of a probability measure on a σ-algebra.

There is no suggestion - I stated it outright - the probability of outcome i is determined by Ei - and only by Ei - which means it must be a function of Ei.

Fredrik said:

This is not so obvious unless you know exactly what the 0 operator and the identity operator represent: Yes-no measurements that always give you the answer "no" and "yes" respectively.

From my stated axiom if the POVM has one element that element must be I. One element means one outcome - ie probability must be 1. The law of total probability is actually overkill - its from Kolmogerov's axioms. Since the probability can't depend on what POVM its in by considering the POVM I + 0 which has two elements, and since the probability of I, ie f(I) is one, again from basic probability, the the probability of the effect 0, f(0) must be 0.

Fredrik said:

If you want to make this argument without using the "theorem + proof" structure, you should explain such things. If you want to make it in the form of a theorem, you should state the theorem in a way that includes a definition of a probability measure on the set of effects.

I thought I was pretty clear - but of course we, and that most definitely includes me, can always improve how we explain things. But, maybe its my applied math background, I am not sure, but if you mention probability, then I assume the reader understands basic probability such as the Kolmogorov axioms and what they imply.

Its 2.00 am where I am so I will take a break as well.

Its a long tricky proof so I will do it a bit at a time - will do a bit more tomorrow when I have had a bit of a sleep and maybe answer a few easier questions - I can see this one will likely take a while.

Thanks
Bill

bhobba · Sep 1, 2014

OK - had a good rest so can do a bit more

bhobba said:

Let E1 + E2 = E3 where E1, E2 and E3 are all effects. Then there exists an effect E E1 + E2 + E = E3 + E = I. Hence f(E1) + f(E2) = f(E3)

Fredrik said:

By your definitions, if ##E_1## and ##E_2## are effects, there exist positive operators ##F_i## and ##G_j## (for each i in some set I, and each j in some set J) such that ##\sum_i F_i=I=\sum_j G_j##, and indices i and j such that ##F_i=E_1## and ##G_j=E_2##. But why should ##F_i+G_j## be an effect?

An effect, by definition, is a positive operator that belongs to a POVM. If E3 is an effect then another effect E (it may be zero) must exist such that E3 + E = I by this definition. Then of course by the fact they are mapped to probability ie since E3 + E is a two element POVM it has two outcomes ie its a two element event space, then f(E3) + f(E) = 1, again from the Kolmogerov axioms, or, if you want to use measure theory language, its a measure space of total measure 1 - which of course is exactly what the Kolmogorov axioms are. Similarly f(E1) + f(E2) + f(E) =1. Equating the two you end up with f(E1) + f(E2) = f(E3) by cancelling f(E).

bhobba said:

Will extend the definition to any positive operator E. If E is a positive operator a n and an effect E1 exists E = n E1 as easily seen by the fact effects are positive operators with trace <= 1. f(E) is defined as nf(E1). To show well defined suppose nE1 = mE2. n/n+m E1 = m/n+m E2. Note m/n+m and n/n+m are rationals less than 1. f(n/n+m E1) = f(m/n+m E1). n/n+m f(E1) = m/n+m f(E2) so nf(E1) = mf(E2).

Fredrik said:

I don't understand what you're doing

All I am doing is invoking the observation I made at the beginning - that a positive operator E is an effect iff Trace (E) <= 1. If E is any positive operator, Trace (E) is then a positive number, so of course a natural number n and a positive operator E1 exists (ie 1/n*E), Trace (E1) <= 1, E = n*E1 (I have inserted the star to ensure the meaning of multiplication is understood). This means E1 is an effect. Now I want to extend the definition of f from effects to any positive operator. I do this by defining f(E) = n*f(E1). But the n and E1 are not unique - all sorts of n and E1 are valid. For the definition to make sense I must show it leads to exactly the same f(E). So if n*E1 = m*E2, n/n+m*E1 = m/n+m*E2. Again from the trace observation this means n/n+m*E1 and m/n+m*E2 are effects hence f(n/n+m*E1) = f(m/n+m*E1). Thus n/n+m*f(E1) = m/n+m*f(E2) so n*f(E1) = m*f(E2).

Whew - that took me a bit - so will take a break - more to follow.

Regarding LaTeX - LaTeX is truth, but I find it far to time consuming and cumbersome so try to avoid it.

Thanks
Bill

Fredrik · Sep 2, 2014

Answer to post #48 only.

bhobba said:

For simplicity I will restrict myself to finite dimensional spaces.

My personal view of QM is always to do foundational issues in finite spaces...

OK. That leads to a much less impressive theorem, but honestly, if the goal is just to use mathematics to improve our intuitive understanding about QM, it will do.

bhobba said:

Given any effect E you have another effect U (it may be zero) such that E+U = 1 or E = 1-U. Take the trace of both sides and since E and U are positive Trace E <= 1.

I don't quite follow this. I'm OK with the beginning, which says that if E is an effect, then so is 1-E. If we define U=1-E, we have E=1-U and Tr E = Tr(1-U) = Tr 1-Tr U ≤ Tr 1 = dim H. But I don't see how we get Tr E≤1.

bhobba said:

Conversely, suppose E is a positive operator Trace E <= 1. Let U = 1-E then Trace U <=1 and is positive so E and U are positive operators U + E = 1 - hence E is an effect.

I don't think your U will always be positive. If E is a positive operator that isn't an effect, then its spectrum contains a real number r such that r>1. The spectrum of 1-E will then contain 1-r. Since 1-r<1-1=0, this implies that 1-E isn't positive.

bhobba said:

There is no suggestion - I stated it outright - the probability of outcome i is determined by Ei - and only by Ei - which means it must be a function of Ei.

OK, the mere mention of the word "probability" tells us a lot.

bhobba said:

From my stated axiom if the POVM has one element that element must be I. One element means one outcome - ie probability must be 1.

OK.

Fredrik · Sep 2, 2014

bhobba said:

An effect, by definition, is a positive operator that belongs to a POVM. If E3 is an effect then another effect E (it may be zero) must exist such that E3 + E = I by this definition. Then of course by the fact they are mapped to probability ie since E3 + E is a two element POVM it has two outcomes ie its a two element event space, then f(E3) + f(E) = 1, again from the Kolmogerov axioms, or, if you want to use measure theory language, its a measure space of total measure 1 - which of course is exactly what the Kolmogorov axioms are. Similarly f(E1) + f(E2) + f(E) =1. Equating the two you end up with f(E1) + f(E2) = f(E3) by cancelling f(E).

I don't see why the E in f(E3) + f(E) = 1 should be the same as the E in f(E1) + f(E2) + f(E) =1. Also, to get that second equality, you must have used f(E1+E2)=f(E1)+f(E2), which is fine if E1 and E2 are part of the same POVM and therefore correspond to mutually exclusive outcomes, but what if they're not?

bhobba said:

Regarding LaTeX - LaTeX is truth, but I find it far to time consuming and cumbersome so try to avoid it.

It's up to you, but it's really very easy. It takes a little bit of time the first and second time, but then it doesn't slow you down, except when you're using it to say things that you otherwise wouldn't. I think it would take you less time to learn these LaTeX codes than it did to write that last post above.

Code:

x^y
x^{yz}
E_1
E_{12}
\sin\theta
\cos{3x}
\sum_{k=1}^\infty x_k
\sqrt{1-v^2}
\frac{u+v}{1+uv}
\int_a^b f(x) dx
\mathbb R
\mathbb C

$$\begin{align}
&x^y\\
&x^{y+z}\\
&E_1\\
&E_{12}\\
&\sin\theta\\
&\cos{3x}\\
&\sum_{k=1}^\infty x_k\\
&\sqrt{1-v^2}\\
&\frac{u+v}{1+uv}\\
&\int_a^b f(x) dx\\
&\mathbb R\\
&\mathbb C\\
&\end{align}$$ We have a FAQ post on LaTeX if you decide to give it a try. You you can type something into a reply and just preview it if you want practice. https://www.physicsforums.com/showpost.php?p=3977517&postcount=3

bhobba · Sep 2, 2014

Fredrik said:

I don't see why the E in f(E3) + f(E) = 1 should be the same as the E in f(E1) + f(E2) + f(E) =1.

We assume E1 + E2 = E3 and all three are effects. Since E3 is an effect by definition it must be part of a POVM ie some ∑Ui = 1, Ui positive operators. WOLG (without loss of generality) we can assume E3 = U1 so E3 + ∑Ui = 1 where the Ui are summed from 2. We let E = ∑Ui so E3 + E = 1. E is obviously also an effect since E3 + E is a POVM. Now since E3 = E1 + E2 we have E1 + E2 + E = 1.

Fredrik said:

Also, to get that second equality, you must have used f(E1+E2)=f(E1)+f(E2), which is fine if E1 and E2 are part of the same POVM and therefore correspond to mutually exclusive outcomes, but what if they're not?

In my basic assumption I have assumed the probability of outcome i depends only on the Ei its mapped to, in particular I have assumed it does not depend on what POVM it is part of ie this is the assumption of non-contextuality which is the real key to Gleason in either variant. This means f(E), which I have defined as the probability of E, is the same whether E is part of the POVM E1 + E2 + E = 1 or E is part of the E3 + E = 1 POVM.

Now since the E1 + E2 + E POVM has three outcomes and one of those outcomes must occur, f(E1) + f(E2) + f(E3) = 1 and similarly for the POVM E3 + E we have f(E3) + f(E) = 1. Equating and cancelling f(E) we have f(E1) + f(E2) = f(E3).

Fredrik said:

It's up to you, but it's really very easy.

Must get around to it when I get a bit of time.

Have few things to do today so will leave it to a bit later to look at your other issues in the main thread.

Thanks
Bill

naima · Sep 3, 2014

To Fredrik,

Remember that Busch's theorem needs no choice of topology on Effects to be true. Only sigma additivity.

Fredrik · Sep 3, 2014

naima said:

Remember that Busch's theorem needs no choice of topology on Effects to be true. Only sigma additivity.

When we're dealing with classical probability measures, the σ-addititivity condition looks like this: ##\mu\big(\bigcup_{i=1}^\infty E_i\big)=\sum_{i=1}^\infty\mu(E_i)##. Here ##(E_i)_{i=1}^\infty## is a pairwise disjoint sequence of sets. When we're dealing with probability measures on the set of effects, the σ-addititivity condition looks like this: ##\mu\big(\sum_{i=1}^\infty E_i\big)=\sum_{i=1}^\infty\mu(E_i)##. The sum on the left is defined as the limit of the sequence of partial sums, and the limit is defined using a topology.

I don't think this is a big issue, since ##\mathcal E(\mathcal H)## is a subset of ##\mathcal B(\mathcal H)##, which has several useful topologies. The word "several" in that sentence is perhaps something to be concerned about. I haven't really thought that through.

naima · Sep 4, 2014

The core of the proof is that the dual of compact operators is the set of trace class operators

http://en.wikipedia.org/wiki/Trace_class#Trace_class_as_the_dual_of_compact_operators

Does Busch's Theorem Offer a Simplified Proof of Gleason's Theorem?

Similar threads

Hot Threads

Recent Insights