Question about convex property in Jensen's inequality

In summary, the discussion revolves around the convex property in Jensen's inequality, which states that for a convex function, the function's value at the average of points is less than or equal to the average of the function's values at those points. The inquiry seeks to clarify the conditions under which this inequality holds and explores its implications in various mathematical contexts.
  • #1
psie
269
32
TL;DR Summary
I am reading a proof of Jensen's inequality. I am getting stuck on an "elementary property" of convex functions.
I am reading a proof of Jensen's inequality. The proof goes like this.

Theorem 4.3: Let ##(\Omega, \mathcal A,\mu)## be a probability space and let ##\varphi:\mathbb R\to\mathbb R_+## be a convex function. Then for every ##f\in L^1(\Omega, \mathcal A,\mu)##, $$\int_\Omega \varphi\circ f\, d\mu\geq\varphi\left(\int_\Omega f\, d\mu\right).$$ Proof: Set $$\mathcal E_\varphi=\{(a,b)\in\mathbb R^2:\forall x\in\mathbb R,\varphi(x)\geq ax+b\}.$$ Then by elementary properties of convex functions, $$\varphi \left(x\right)=\sup_{\left(a{,}b\right)\in \mathcal E_{\varphi }}\left(ax+b\right).\tag1$$ ... ... ...
I do not know much about convex functions, but why does (1) hold?

The definition of convex I'm using is that $$\varphi(tx+(1-t)y)\leq t\varphi(x)+(1-t)\varphi(y)$$ holds for all ##x,y\in\mathbb R## and all ##t\in[0,1]##.
 
Physics news on Phys.org
  • #2
From here, I found the answer. However I still have some questions:
If ##\phi## is convex, for each point ##(\alpha, \phi(\alpha))##, there exists an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## such that
- the line ##L_\alpha## corresponding to ##f_\alpha## passes through ##(\alpha, \phi(\alpha))##;
- the graph ##\phi## lies above ##L_\alpha##.
Let ##A = \{f_\alpha: \alpha \in \mathbb{R}\}## be the set of all such functions. We have
- ##\sup_{f_\alpha \in A} f_\alpha(x) \geq f_x(x) = \phi(x)## because ##f_x## passes through ##(x, \phi(x))##;
- ##\sup_{f_\alpha \in A} f_\alpha(x) \leq \phi(x)## because all ##f_\alpha## lies below ##\phi##.
How does one show that there exist such an affine function ##f_\alpha(x) = a_\alpha x + b_\alpha## with those properties given the definition I gave above?
 
  • #3
A function is convex iff its epigraph is convex. In case of differentiability, I assume we can simply use the tangents at ##(a, \varphi (a)).## The difficulty is the points that do not have one. My next assumption is, that such points have at least one-sided tangents that will do. These ideas are based on the functions ##x^2## and ##|x|## and I think they are typical.
Wikipedia said:
Eine auf einem offenen Intervall definierte, konvexe bzw. konkave Funktion ist lokal Lipschitz-stetig und somit nach dem Satz von Rademacher fast überall differenzierbar. Sie ist in jedem Punkt links- und rechtsseitig differenzierbar.
(A convex or concave function on an open interval is locally Lipschitz continuous and by Rademacher's theorem almost everywhere differentiable. It is at each point left- and right differentiable.)

Looks like my intuition is correct but the proof needs some consideration.
 
  • Like
Likes psie
  • #4
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:

Proof. Let ##c=\int f \,d \mu## and let ##l(x)=ax+b## be a linear function that has ##l(c)= \phi(c)## and ##\phi(x) \geq l(x)##. To see that such a function exists, recall that convexity implies
$$\lim_{h \to 0^+} \frac{\phi(c)−\phi(c−h)}{h} \leq \lim_{h \to 0^+} \frac{\phi(c+h)−\phi(c)}{h}\tag2$$(The limits exist since the sequences are monotone.)

If we let ##a## be any number between the two limits and let ##l(x) = a(x − c) + \phi(c)##, then ##l## has the desired properties. With the existence of ##l## established, the rest is easy. From the fact that if ##g \leq f## a.e., then ##\int g\, d\mu \leq \int f \,d\mu##, we have
$$ \int \phi(f ) \,d\mu \geq \int (af + b) \,d\mu = a \int f \,d\mu + b = l\left(\int f \,d\mu \right)= \phi\left(\int f \,d\mu \right),$$ since ##c = \int f \,d \mu## and ##l(c) = φ(c)##.
I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
 
  • #5
psie said:
Here is Durret's proof of the inequality (in his book Probability: Theory and Examples). He uses ##\phi## to denote the function ##\varphi## in the theorem in my original post:


I can see why ##(2)## holds, but I do not see why the function ##l(x) = a(x − c) + \phi(c)## satisfies ##\phi(x) \geq l(x)##. Is this clear to someone? Also, when Durret says "The limits exist since the sequences are monotone", which sequences does he mean?
Isn't ##l(x)=a(x − c) + \phi(c) =ax + \underbrace{(\phi(c)-c)}_{=b}\leq \phi(x)## simply the definition of ##\mathcal{E}_\varphi =\mathcal{E}_\phi\;##?

Durret probably means with sequence something like ...
$$
\lim_{h \to 0^+} f(h) = \lim_{n \to \infty} f(n^{-1})
$$
 
  • Like
Likes psie
Back
Top