On Jensen's inequality for conditional expectation

  • #1
psie
261
32
TL;DR Summary
I am reading a proof of Jensen's inequality for conditional expectation in Le Gall's book Measure Theory, Probability and Stochastic Processes. I am a bit surprised that this inequality does not simply follow from the measure theoretic form that has been previously established, but requires a new, somewhat technical proof. I have some questions about the proof.
Theorem. Let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function and ##X\in L^1##, then $$E[\varphi(X)\mid\mathcal B]\geq\varphi(E[X\mid\mathcal B]).$$

Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}=$$ Then by convexity of ##\varphi##, $$\varphi \left(x\right)=\underbrace{\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right)}_{g(x)}=\underbrace{\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}\left(ax+b\right)}_{h(x)}.$$ We can take advantage of the fact that ##\mathbb Q^2## is countable to disgard [I think it should be discard] a countable collection of sets of probability zero and to get that, a.s., \begin{align*} E[\varphi(X)\mid \mathcal B]&=E\left[\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}\left(aX+b\right)\Bigm\vert \mathcal B\right] \\ &\geq \sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }\cap \mathbb Q^2}E[aX+b\mid\mathcal B] \\ &=\varphi(E[X\mid\mathcal B])\end{align*}

Questions:

1. I am a bit unsure why ##g(x)=h(x)##. Clearly ##g(x)\geq h(x)##, but why is ##g(x)\leq h(x)##? Here's my explanation, which is kind of lengthy, but maybe you have a better one.

If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)>ax+b## for all ##x\in\mathbb R##, then pick a number ##q_x## in between. Let ##q_x=a'x+b'## where by denseness we choose ##(a',b')## sufficiently close to ##(a,b)## so that ##q_x## satisfies the inequality ##\varphi(x)>q_x>ax+b## for all ##x\in\mathbb R##. Then ##(a',b')\in\mathcal E_\varphi\cap\mathbb Q^2##, and since ##(a,b)## was arbitrary, this shows that ##g(x)\leq h(x)## when ##\varphi(x)>ax+b##. If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)=ax+b##, then we approximate ##(a,b)## from below by rational pairs, and the supremum will give that ##g(x)=h(x)##. Does this make sense?

2. I do not understand what the author means by "We can take advantage of the fact that ##\mathbb Q^2## is countable to [discard] a countable collection of sets of probability zero..."? Moreover I am a bit unsure about the last inequality in the proof. Is this simply an application of monotonicity, i.e. $$\sup(aX+b)\geq aX+b\implies E[\sup(aX+b)\mid\mathcal B]\geq E[aX+b\mid\mathcal B],$$ so taking the supremum of this last inequality gives the desired inequality at the end of the proof. If my reasoning is correct, I don't see why we need to consider ##\mathcal E_\varphi\cap\mathbb Q^2##.
 
Physics news on Phys.org
  • #2
psie said:
TL;DR Summary: I am reading a proof of Jensen's inequality for conditional expectation in Le Gall's book Measure Theory, Probability and Stochastic Processes. I am a bit surprised that this inequality does not simply follow from the measure theoretic form that has been previously established, but requires a new, somewhat technical proof. I have some questions about the proof.

Questions:

1. I am a bit unsure why ##g(x)=h(x)##. Clearly ##g(x)\geq h(x)##, but why is ##g(x)\leq h(x)##? Here's my explanation, which is kind of lengthy, but maybe you have a better one.

If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)>ax+b## for all ##x\in\mathbb R##, then pick a number ##q_x## in between. Let ##q_x=a'x+b'## where by denseness we choose ##(a',b')## sufficiently close to ##(a,b)## so that ##q_x## satisfies the inequality ##\varphi(x)>q_x>ax+b## for all ##x\in\mathbb R##. Then ##(a',b')\in\mathcal E_\varphi\cap\mathbb Q^2##, and since ##(a,b)## was arbitrary, this shows that ##g(x)\leq h(x)## when ##\varphi(x)>ax+b##. If ##(a,b)\in\mathcal E_{\varphi}## is such that ##\varphi(x)=ax+b##, then we approximate ##(a,b)## from below by rational pairs, and the supremum will give that ##g(x)=h(x)##. Does this make sense?

2. I do not understand what the author means by "We can take advantage of the fact that ##\mathbb Q^2## is countable to [discard] a countable collection of sets of probability zero..."? Moreover I am a bit unsure about the last inequality in the proof. Is this simply an application of monotonicity, i.e. $$\sup(aX+b)\geq aX+b\implies E[\sup(aX+b)\mid\mathcal B]\geq E[aX+b\mid\mathcal B],$$ so taking the supremum of this last inequality gives the desired inequality at the end of the proof. If my reasoning is correct, I don't see why we need to consider ##\mathcal E_\varphi\cap\mathbb Q^2##.
1. I do not understand your problem. ##\mathbb{Q}^2 \subseteq \mathbb{R}^2## is dense, ##g(x) \in \mathbb{R}_+## is a single real number. Therefore there is always a sequence in ##\mathbb{Q}^2## that converges to that number. Density is the key here plus the supremum is working as a topological closure.

2. As soon as we are on a countable set, we do not need to bother with the zero-set of zero probabilities. But I do not see why that wouldn't be true for a real set, too, if we add a.s. Maybe countability grants us that their measure is zero.

3. The expectation value of the supremum includes the real limits whereas the supremum of the expectation values is restricted to rational numbers so it can be lesser simply because the set where the supremum is taken over is smaller. That would be my interpretation but I would also like to see an example where the strict inequality holds.
 
  • Like
Likes psie

Similar threads

Back
Top