Effect of Perturbation on Gradient Descent Sequence

Vulture1991 · Sep 22, 2019

Consider a function $f\in\mathcal{C}^2$ with Lipschitz continuous gradient (with constant $L$)- we also assume the function is lowerbounded and has at least one minimum. Let $\{x^k\}_k$ be the sequence generated by Gradient Descent algorithm with initial point $x^0$ and step-size $0<\alpha<2/L$:
\begin{equation}
x^{k+1}=x^k-\alpha\nabla f (x^k).
\end{equation}
We know that the sequence will converge to a critical point.

Now consider the new function $\tilde{f}(x)=f(x)+x'Ax$ with some $A\succeq\mathbf{0}$. Let $\{\tilde{x}^k\}_k$ be the sequence generated by Gradient Descent algorithm with **the same** initial point $\tilde{x}^0=x^0$ and step-size $0<\alpha<2/L$:
\begin{equation}
\tilde{x}^{k+1}=\tilde{x}^k-\alpha\nabla \tilde{f} (\tilde{x}^k).
\end{equation}

**Can we prove that $\mathrm{dist}\left(\{\tilde{x}^k\}_k,\{{x}^k\}_k\right)$ is uniformly bounded, independent of $A$ and step-size $\alpha$?**

I tried to prove it by assuming existence of a compact sublevel set $\mathcal{L}=\{x:f(x)\leq f(x^0)\}$ and using the fact that Gradient Descent generates a decreasing sequence of objective values (implying that the sequence remains in the compact sublevel set). However I cannot prove existence of a set independent of both $A$ and $\alpha$.

jvicens · Sep 22, 2019

I would like to point out that the statement as it is currently written is not entirely clear. It is not specified what is meant by the distance between the two sequences, and it is not clear what is meant by "uniformly bounded". Is it meant to be bounded by a constant, or by a function of $k$? Furthermore, the notation used for the two sequences is inconsistent, as one uses $\{x^k\}_k$ and the other uses $\{\tilde{x}^k\}_k$.

Assuming that the distance between the two sequences refers to the Euclidean distance between the points in the sequences, and that "uniformly bounded" means bounded by a constant, I would approach this problem by considering the Lipschitz continuity of the gradient of the original function $f$. Since $f$ has a Lipschitz continuous gradient with constant $L$, we know that for any two points $x,y$, the following inequality holds:
\begin{equation}
|\nabla f(x)-\nabla f(y)| \leq L|x-y|.
\end{equation}
Using this, we can show that the distance between the points in the two sequences is bounded by a constant. Let $d_k=|x^k-\tilde{x}^k|$, then we have:
\begin{align}
d_{k+1} &= |x^{k+1}-\tilde{x}^{k+1}| \\
&= |x^k-\alpha\nabla f(x^k)-\tilde{x}^k+\alpha\nabla \tilde{f}(\tilde{x}^k)| \\
&= |d_k-\alpha(\nabla f(x^k)-\nabla \tilde{f}(\tilde{x}^k))| \\
&\leq |d_k|+\alpha|\nabla f(x^k)-\nabla \tilde{f}(\tilde{x}^k)| \\
&\leq |d_k|+\alpha L|x^k-\tilde{x}^k| \\
&= (1+\alpha L)d_k.
\end{align}
Thus, we have shown that the distance between the points in the two sequences is bounded by a constant, specifically $(1+\alpha L)^k$. This bound is independent of $

Effect of Perturbation on Gradient Descent Sequence

FAQ: Effect of Perturbation on Gradient Descent Sequence

What is the "Effect of Perturbation" on the Gradient Descent Sequence?

How does Perturbation affect the Convergence of Gradient Descent?

Can Perturbation improve the Performance of Gradient Descent?

How can we mitigate the Negative Effects of Perturbation on Gradient Descent?

Is there a trade-off between Perturbation and Convergence in Gradient Descent?

Similar threads

Hot Threads

Recent Insights