Question about convex property in Jensen's inequality

  • #1
psie
261
32
TL;DR Summary
I am reading a proof of Jensen's inequality. I am getting stuck on an "elementary property" of convex functions.
I am reading a proof of Jensen's inequality. The proof goes like this.

Theorem 4.3: Let ##(\Omega, \mathcal A,\mu)## be a probability space and let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function. Then for every ##f\in L^1(\Omega, \mathcal A,\mu)##, $$\int_\Omega \varphi\circ f\, d\mu\geq\varphi\left(\int_\Omega f\, d\mu\right).$$ Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}.$$ Then by elementary properties of convex functions, $$\varphi \left(x\right)=\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right).\tag1$$ ... ... ...
I do not know much about convex functions, but why does (1) hold?

The definition of convex I'm using is that $$\varphi(tx+(1-t)y)\leq t\varphi(x)+(1-t)\varphi(y)$$ holds for all ##x,y\in\mathbb R## and all ##t\in[0,1]##.
 
Physics news on Phys.org
  • #2
From here, I found the answer. However I still have some questions:
If ##\phi## is convex, for each point ##(\alpha, \phi(\alpha))##, there exists an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## such that
- the line ##L_\alpha## corresponding to ##f_\alpha## passes through ##(\alpha, \phi(\alpha))##;
- the graph ##\phi## lies above ##L_\alpha##.
Let ##A = \{f_\alpha: \alpha \in \mathbb{R}\}## be the set of all such functions. We have
- ##\sup_{f_\alpha \in A} f_\alpha(x) \geq f_x(x) = \phi(x)## because ##f_x## passes through ##(x, \phi(x))##;
- ##\sup_{f_\alpha \in A} f_\alpha(x) \leq \phi(x)## because all ##f_\alpha## lies below ##\phi##.
How does one show that there exist such an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## with those properties given the definition I gave above?
 
  • #3
A function is convex iff its epigraph is convex. In case of differentiability, I assume we can simply use the tangents at ##(a, \varphi (a)).## The difficulty is the points that do not have one. My next assumption is, that such points have at least one-sided tangents that will do. These ideas are based on the functions ##x^2## and ##|x|## and I think they are typical.
Wikipedia said:
Eine auf einem offenen Intervall definierte, konvexe bzw. konkave Funktion ist lokal Lipschitz-stetig und somit nach dem Satz von Rademacher fast überall differenzierbar. Sie ist in jedem Punkt links- und rechtsseitig differenzierbar.
(A convex or concave function on an open interval is locally Lipschitz continuous and by Rademacher's theorem almost everywhere differentiable. It is at each point left- and right differentiable.)

Looks like my intuition is correct but the proof needs some consideration.
 
  • Like
Likes psie
  • #4
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:

Proof. Let ##c=\int f \,d \mu## and let ##l(x)=ax+b## be a linear function that has ##l(c)= \phi(c)## and ##\phi(x) \geq l(x)##. To see that such a function exists, recall that convexity implies
$$\lim_{h \to 0^+} \frac{\phi(c)−\phi(c−h)}{h} \leq \lim_{h \to 0^+} \frac{\phi(c+h)−\phi(c)}{h}\tag2$$(The limits exist since the sequences are monotone.)

If we let ##a## be any number between the two limits and let ##l(x) = a(x − c) + \phi(c)##, then ##l## has the desired properties. With the existence of ##l## established, the rest is easy. From the fact that if ##g \leq f## a.e., then ##\int g\, d\mu \leq \int f \,d\mu##, we have
$$ \int \phi(f ) \,d\mu \geq \int (af + b) \,d\mu = a \int f \,d\mu + b = l\left(\int f \,d\mu \right)= \phi\left(\int f \,d\mu \right),$$ since ##c = \int f \,d \mu## and ##l(c) = φ(c)##.
I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
 
  • #5
psie said:
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:


I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
Isn't ##l(x)=a(x − c) + \phi(c) =ax + \underbrace{(\phi(c)-c)}_{=b}\leq \phi(x)## simply the definition of ##\mathcal{E}_\varphi =\mathcal{E}_\phi\;##?

Durret probably means with sequence something like ...
$$
\lim_{h \to 0^+} f(h) = \lim_{n \to \infty} f(n^{-1})
$$
 
  • Like
Likes psie

Similar threads

Back
Top