Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693e3ce6-229c-8008-97dc-ab720cb1f95a
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: Append the following section it clarifies why plugging γ0\gamma0γ0 into the γ0\gamma0γ0 bound is invalid, and gives … === Append the following section (it clarifies why plugging γ=0\gamma=0γ=0 into the γ>0\gamma>0γ>0 bound is invalid, and gives the correct “fallback” runtime bound that always holds). open-problem-fountoulakis22a <syntaxhighlight lang="latex">% ============================================================ \section{When the margin vanishes: why the $1/\gamma^2$ term blows up (and what it does \emph{not} mean)} \label{sec:gamma_zero} A potential point of confusion is the appearance of the term $1/\gamma^2$ in bounds such as \begin{equation}\label{eq:workeps_ppr_repeat} \mathrm{Work}(N_\varepsilon) = O\!\left( \frac{1}{\rho\sqrt{\alpha}}\log\frac{1}{\varepsilon} \;+\; \frac{1}{\gamma^2\sqrt{\alpha}} \right), \end{equation} and what happens on instances where $\gamma=0$ (e.g., the star/path breakpoint examples). \subsection{The key point: \eqref{eq:workeps_ppr_repeat} is a conditional bound (it assumes $\gamma>0$)} Recall the definition of the (strict complementarity) margin: \[ \gamma := \min_{i\in I^\star}\left(\lambda_i - |\nabla f(x^\star)_i|\right), \qquad \lambda_i=\rho\alpha\sqrt{d_i}. \] All of the `<code>finite identification'' and </code><code>$\varepsilon$-independent spurious-support'' bounds were proved \emph{under the assumption $\gamma>0$}. The factor $1/\gamma^2$ arises by dividing by the minimum slack in the inactive KKT inequalities. If $\gamma=0$, then there exists an inactive index $i\in I^\star$ with \[ |\nabla f(x^\star)_i|=\lambda_i, \] i.e., that coordinate lies \emph{exactly} on the soft-thresholding boundary at the optimum. In this case, there is no neighborhood of $x^\star$ in which the proximal-gradient update is guaranteed to keep all inactive coordinates at zero: arbitrarily small perturbations of the extrapolated point $y$ can flip such a tight coordinate between $0$ and a tiny nonzero value. Consequently, one cannot in general guarantee \emph{finite-time identification} of the optimal inactive set, and the proofs yielding $\varepsilon$-independent spurious-support bounds do not apply. \paragraph{Conclusion.} Setting $\gamma=0$ in \eqref{eq:workeps_ppr_repeat} does \emph{not} imply the algorithm requires infinite work; it only means that \eqref{eq:workeps_ppr_repeat} becomes \emph{vacuous} when used outside its assumptions. \subsection{What remains true when $\gamma=0$: finite work for any fixed accuracy} Even when $\gamma=0$, accelerated proximal-gradient methods still enjoy the standard accelerated \emph{iteration} complexity for strongly convex composite optimization. In particular, under $\mu$-strong convexity of $F$ and $L$-smoothness of $f$, there exists $q\in(0,1)$ (typically $q\approx 1-\sqrt{\mu/L}$) such that \[ F(x_k)-F(x^\star)\le \Delta_0\,q^k, \qquad \Delta_0:=F(x_0)-F(x^\star). \] Therefore, for any target $\varepsilon>0$, an $\varepsilon$-accurate iterate is obtained after \[ N_\varepsilon = O\!\left(\frac{\log(\Delta_0/\varepsilon)}{\log(1/q)}\right) = O\!\left(\sqrt{\frac{L}{\mu}}\log\frac{\Delta_0}{\varepsilon}\right) \] iterations. However, without a positive margin, we cannot bound the supports during the transient independently of $n$. A worst-case bound is obtained by charging each iteration the full dimension. \begin{corollary}[Always-valid fallback work bound (no margin needed)]\label{cor:fallback_work} Assume only that $f$ is $L$-smooth and $F=f+g$ is $\mu$-strongly convex, and run an accelerated proximal-gradient method for $N_\varepsilon$ iterations. Then for any $\varepsilon>0$, \[ \mathrm{Work}(N_\varepsilon) \;\le\; O\!\left(n\,N_\varepsilon\right) = O\!\left(n\sqrt{\frac{L}{\mu}}\log\frac{\Delta_0}{\varepsilon}\right), \] independent of the margin $\gamma$. In the $\ell_1$-regularized PageRank setting (undirected graph), $L\le 1$ and $\mu=\alpha$, hence \[ \mathrm{Work}(N_\varepsilon) = O\!\left(\frac{n}{\sqrt{\alpha}}\log\frac{\Delta_0}{\varepsilon}\right). \] \end{corollary} \begin{remark}[How to interpret the role of $\gamma$] The margin $\gamma>0$ is precisely what upgrades the fallback bound above (which scales with $n$) to a \emph{local} bound that depends on $1/\rho$ and is independent of $n$. When $\gamma=0$, locality cannot be certified by this approach: tight inactive coordinates can keep </code>`flickering'' and may prevent $\varepsilon$-independent bounds on the cumulative spurious supports. % ============================================================ </syntaxhighlight>
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)