- #1
TackyBoar
- 1
- 0
I read Iterative methods for optimization by C. Kelley (PDF) and I'm struggling to understand proof of
Notes on notation: \(\displaystyle S \) is a simplex with vertices \(\displaystyle x_1 \) to \(\displaystyle x_{N+1} \) (order matters), some edges \(\displaystyle v_j = x_j - x_1 \) that make matrix \(\displaystyle V = (v_1, \dots, v_n) \) and \(\displaystyle \sigma_+(S) = \max_j \lVert x_1 - x_j \rVert_2 \). Simplex "gradient" is \(\displaystyle D(f : S) = V^{-T} (f(x_2) - f(x_1), \dots, f(x_{N+1}) - f(x_1)). \)
Proof starts with the following statement:
Using Taylor expansion about \(\displaystyle x_1 \) gives
\(\displaystyle f(x_j) = f(x_1) + \nabla f(x_1)(x_j - x_1)^T + R(x_j - x_1) = f(x_1) + \nabla f(x_1)v_j^T + R(v_j) \)
and using Mean Value Theorem on \(\displaystyle R(v_i) \) leads to
\(\displaystyle
|R(v_j)|
= |R(v_j)| - |R(0)|
\leq \lVert v_j \rVert \cdot \sup_{\theta \in [0, 1]} \lVert \nabla R(\theta v_j) \rVert
\leq \lVert v_j \rVert \cdot K_f \lVert v_j \rVert.
\)
It seems that the constant could be just \(\displaystyle 1 \cdot K_f \). Is there a reasoning error in what I wrote? Later in the text I spotted a similar Lemma with a different constant:
It uses similar concepts and I think the constant may mean something here.
Lemma 6.2.1. Let \(\displaystyle S\) be a simplex. Let \(\displaystyle \nabla f \) be Lipschitz continuous in a neighborhood of \(\displaystyle S\) with Lipschitz constant \(\displaystyle 2K_f\). Then there is \(\displaystyle K>0\), depending only on \(\displaystyle K_f\) such that
\(\displaystyle \left\lVert \nabla f(x_1) - D(f : S) \right\rVert \leq K \kappa(V) \sigma_+(S).\)
Notes on notation: \(\displaystyle S \) is a simplex with vertices \(\displaystyle x_1 \) to \(\displaystyle x_{N+1} \) (order matters), some edges \(\displaystyle v_j = x_j - x_1 \) that make matrix \(\displaystyle V = (v_1, \dots, v_n) \) and \(\displaystyle \sigma_+(S) = \max_j \lVert x_1 - x_j \rVert_2 \). Simplex "gradient" is \(\displaystyle D(f : S) = V^{-T} (f(x_2) - f(x_1), \dots, f(x_{N+1}) - f(x_1)). \)
Proof starts with the following statement:
Why is it true?Our smoothness assumptions on \(\displaystyle f\) and Taylor’s theorem imply that for all \(\displaystyle 2 \leq j \leq N+1 \)
\(\displaystyle \left|f\left(x_{1}\right)-f\left(x_{j}\right)+v_{j}^{T} \nabla f\left(x_{1}\right)\right| \leq K_{f}\left\|v_{j}\right\|^{2}\)
Using Taylor expansion about \(\displaystyle x_1 \) gives
\(\displaystyle f(x_j) = f(x_1) + \nabla f(x_1)(x_j - x_1)^T + R(x_j - x_1) = f(x_1) + \nabla f(x_1)v_j^T + R(v_j) \)
and using Mean Value Theorem on \(\displaystyle R(v_i) \) leads to
\(\displaystyle
|R(v_j)|
= |R(v_j)| - |R(0)|
\leq \lVert v_j \rVert \cdot \sup_{\theta \in [0, 1]} \lVert \nabla R(\theta v_j) \rVert
\leq \lVert v_j \rVert \cdot K_f \lVert v_j \rVert.
\)
It seems that the constant could be just \(\displaystyle 1 \cdot K_f \). Is there a reasoning error in what I wrote? Later in the text I spotted a similar Lemma with a different constant:
Lemma 6.2.5. Let \(\displaystyle S\) be a nonsingular simplex and let \(\displaystyle \nabla^2 f \) be Lipschitz continuous in a neighborhood of \(\displaystyle S \cup R(S) \) with Lipschitz constant \(\displaystyle 3K_C \). Then there is \(\displaystyle K>0 \) such that
\(\displaystyle \left\lVert \nabla f(x_1) - D_C(f : S) \right\rVert \leq K \kappa(V) \sigma_+(S)^2. \)
It uses similar concepts and I think the constant may mean something here.