Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693e3ce6-229c-8008-97dc-ab720cb1f95a
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: Below is a self-contained LaTeX note that proves a finite active-set identification result for standard accelerated pro… === Below is a self-contained LaTeX note that proves a finite active-set identification result for standard accelerated proximal gradient (APG / FISTA-style momentum) on the ℓ1\ell_1ℓ1-regularized PageRank objective (3) from your COLT’22 open problem note open-problem-fountoulakis22a , under a strict-complementarity “margin” assumption. The proof shows that once the accelerated iterates are close enough to the optimum (with a radius proportional to the margin), no coordinates outside the optimal support can ever be activated again; it also gives an explicit bound on the identification time in terms of the margin and the accelerated convergence rate. <syntaxhighlight lang="latex">\documentclass[11pt]{article} \usepackage{amsmath,amssymb,amsthm,mathtools} \usepackage[margin=1in]{geometry} \usepackage{hyperref} % -------------------- % Theorem environments % -------------------- \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \newtheorem{corollary}{Corollary} \newtheorem{assumption}{Assumption} \theoremstyle{remark} \newtheorem{remark}{Remark} % -------------------- % Macros % -------------------- \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\ip}[2]{\left\langle #1,#2 \right\rangle} \newcommand{\prox}{\operatorname{prox}} \newcommand{\supp}{\operatorname{supp}} \newcommand{\diag}{\operatorname{diag}} \title{Finite Active-Set Identification for Accelerated Proximal Gradient on $\ell_1$-Regularized PageRank Under a Margin Condition} \author{} \date{} \begin{document} \maketitle \begin{abstract} We consider the $\ell_1$-regularized PageRank objective \[ \min_{x\in\R^n}\; F(x) \;:=\; f(x) + g(x), \qquad g(x) := \rho\alpha \norm{D^{1/2}x}_1, \] where $f$ is the smooth quadratic arising from personalized PageRank. We prove a \emph{finite identification} statement for standard accelerated proximal-gradient methods: under a strict-complementarity \emph{margin} condition at the unique minimizer $x^\star$, all coordinates outside $\supp(x^\star)$ become and remain identically zero after finitely many iterations. An explicit identification-time bound is given in terms of the margin and the accelerated convergence rate. \end{abstract} \section{Problem setup: $\ell_1$-regularized PageRank as a composite strongly convex problem} Let $G=(V,E)$ be an undirected graph with $n=|V|$ nodes. Let $A\in\R^{n\times n}$ denote the adjacency matrix and $D=\diag(d_1,\dots,d_n)$ the degree matrix. Fix $\alpha\in(0,1]$ and a seed vector $s\in\R^n$ (e.g.\ an indicator of a seed node). Define the (symmetric) normalized Laplacian \[ L := I - D^{-1/2} A D^{-1/2}. \] It is standard that $L$ is positive semidefinite and $\lambda(L)\subseteq[0,2]$ for undirected graphs. Following the $\ell_1$-regularized PageRank formulation, define the matrix \[ Q := \alpha I + \frac{1-\alpha}{2}L, \] and the smooth function \begin{equation}\label{eq:f-def} f(x) := \frac12 x^\top Q x - \alpha\, x^\top D^{-1/2}s. \end{equation} Let the nonsmooth regularizer be \begin{equation}\label{eq:g-def} g(x) := \rho\alpha\, \norm{D^{1/2}x}_1 = \sum_{i=1}^n \lambda_i |x_i|, \qquad \lambda_i := \rho\alpha\sqrt{d_i}. \end{equation} We study \begin{equation}\label{eq:main-problem} \min_{x\in\R^n} F(x) := f(x)+g(x). \end{equation} \subsection{Smoothness and strong convexity constants for $f$ and $F$} Since $\nabla^2 f(x)=Q$ is constant, we can read smoothness/strong convexity from the spectrum of $Q$. Because $\lambda(L)\subseteq[0,2]$, we have \[ \lambda(Q) = \left\{\alpha + \frac{1-\alpha}{2}\lambda: \lambda\in\lambda(L)\right\} \subseteq [\alpha,\, \alpha+(1-\alpha)] = [\alpha,1]. \] Therefore: \begin{itemize} \item $\nabla f$ is $L_f$-Lipschitz with $L_f=\norm{Q}_2\le 1$. \item $f$ is $\mu$-strongly convex with $\mu=\lambda_{\min}(Q)\ge \alpha$. \end{itemize} In what follows we write $L:=L_f$ and $\mu:=\alpha$ for simplicity; the argument works for general $L,\mu$. Since $g$ is convex, the sum $F=f+g$ is also $\mu$-strongly convex. Hence $F$ has a unique minimizer $x^\star$. \section{Prox operator and optimality conditions} \subsection{Weighted soft-thresholding proximal operator} Because $g(x)=\sum_i \lambda_i|x_i|$ is separable, its proximal operator is coordinatewise. \begin{lemma}[Weighted soft-thresholding]\label{lem:prox} Let $t>0$. For any $u\in\R^n$, \[ \prox_{t g}(u)_i \;=\; \operatorname{sign}(u_i)\,\max\{ |u_i| - t\lambda_i,\; 0\}, \qquad i=1,\dots,n. \] In particular, $\prox_{t g}(u)_i=0$ if and only if $|u_i|\le t\lambda_i$. \end{lemma} \begin{proof} This is standard: minimize $\frac{1}{2t}\norm{x-u}_2^2 + \sum_i \lambda_i|x_i|$, which separates over coordinates. For each coordinate one obtains the soft-thresholding solution with threshold $t\lambda_i$. \end{proof} \subsection{KKT conditions and the active/inactive sets} Let $x^\star$ denote the unique minimizer of \eqref{eq:main-problem}. The first-order optimality condition is \[ 0 \in \nabla f(x^\star) + \partial g(x^\star). \] Coordinatewise, since $\partial|x_i|=\{\operatorname{sign}(x_i)\}$ for $x_i\neq 0$ and $\partial|0|=[-1,1]$, we obtain: \begin{lemma}[Coordinatewise optimality conditions]\label{lem:kkt} For each $i=1,\dots,n$: \begin{itemize} \item If $x_i^\star\neq 0$, then $\nabla f(x^\star)_i + \lambda_i\,\operatorname{sign}(x_i^\star)=0$. Equivalently, $|\nabla f(x^\star)_i|=\lambda_i$. \item If $x_i^\star=0$, then $|\nabla f(x^\star)_i|\le \lambda_i$. \end{itemize} \end{lemma} Define the \emph{active set} $S^\star := \supp(x^\star)=\{i:x_i^\star\neq 0\}$ and the \emph{inactive set} $I^\star := \{1,\dots,n\}\setminus S^\star = \{i:x_i^\star=0\}$. \section{A margin (strict complementarity) assumption} The key idea is to assume that the inactive coordinates satisfy \emph{strict} inequality in the KKT condition. \begin{assumption}[Margin / strict complementarity]\label{ass:margin} Define the margin parameter \begin{equation}\label{eq:margin} \gamma \;:=\; \min_{i\in I^\star}\bigl(\lambda_i - |\nabla f(x^\star)_i|\bigr). \end{equation} Assume $\gamma>0$. Equivalently, for every inactive coordinate $i\in I^\star$, \[ |\nabla f(x^\star)_i|\le \lambda_i - \gamma. \] \end{assumption} \begin{remark} This assumption is standard in manifold-identification results for $\ell_1$-type regularizers. It may fail (i.e.\ $\gamma=0$) in degenerate instances; without a positive margin, one cannot in general guarantee finite-time identification. \end{remark} \section{Accelerated proximal-gradient method} Fix a step size $t\in(0,1/L]$. Consider the standard accelerated proximal-gradient iteration (Nesterov / FISTA-style momentum): \begin{equation}\label{eq:apg} \begin{aligned} y_k &= x_k + \beta_k (x_k - x_{k-1}),\\ x_{k+1} &= \prox_{t g}\bigl( y_k - t \nabla f(y_k)\bigr), \end{aligned} \qquad k\ge 0, \end{equation} with initialization $x_{-1}=x_0$ (so $y_0=x_0$). For strongly convex objectives one often uses a constant $\beta_k\equiv\beta\in[0,1)$, e.g.\ $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$, but the identification argument below only needs that $y_k\to x^\star$ and (for explicit bounds) a known convergence rate. \section{Key stability lemma: the margin prevents activation of inactive coordinates} \begin{lemma}[Inactivity is stable in a neighborhood]\label{lem:stable-zero} Assume $f$ has $L$-Lipschitz gradient in $\ell_2$, and Assumption~\ref{ass:margin} holds. Let $t\in(0,1/L]$ and define the radius \begin{equation}\label{eq:radius} r \;:=\; \frac{t\gamma}{1+tL}. \end{equation} Then for any $y\in\R^n$ satisfying $\norm{y-x^\star}_2 \le r$, the proximal-gradient update \[ x^+ := \prox_{t g}\bigl(y - t\nabla f(y)\bigr) \] satisfies $x^+_i = 0$ for all $i\in I^\star$. \end{lemma} \begin{proof} Fix an inactive coordinate $i\in I^\star$, so $x_i^\star=0$. Let \[ u := y - t\nabla f(y). \] By Lemma~\ref{lem:prox}, it suffices to show $|u_i|\le t\lambda_i$. Write \[ u_i = (y_i-x_i^\star) - t\bigl(\nabla f(y)_i - \nabla f(x^\star)_i\bigr) \;-\; t\nabla f(x^\star)_i. \] Using the triangle inequality and $x_i^\star=0$, \begin{align*} |u_i| &\le |y_i-x_i^\star| + t|\nabla f(y)_i - \nabla f(x^\star)_i| + t|\nabla f(x^\star)_i| \\ &\le \norm{y-x^\star}_2 + t\norm{\nabla f(y)-\nabla f(x^\star)}_2 + t|\nabla f(x^\star)_i|. \end{align*} Since $\nabla f$ is $L$-Lipschitz in $\ell_2$, $\norm{\nabla f(y)-\nabla f(x^\star)}_2 \le L\norm{y-x^\star}_2$, hence \[ |u_i|\le (1+tL)\norm{y-x^\star}_2 + t|\nabla f(x^\star)_i|. \] By the margin assumption, $|\nabla f(x^\star)_i|\le \lambda_i - \gamma$, so \[ |u_i|\le (1+tL)\norm{y-x^\star}_2 + t(\lambda_i-\gamma). \] If $\norm{y-x^\star}_2\le r=\frac{t\gamma}{1+tL}$, then \[ |u_i|\le (1+tL)\cdot \frac{t\gamma}{1+tL} + t(\lambda_i-\gamma) = t\lambda_i. \] Therefore $|u_i|\le t\lambda_i$, and Lemma~\ref{lem:prox} gives $x^+_i=0$. Since $i\in I^\star$ was arbitrary, $x^+_{I^\star}=0$. \end{proof} \section{Main result: finite identification for accelerated proximal gradient} \subsection{Existence of a finite identification time} \begin{theorem}[Finite active-set identification (qualitative)]\label{thm:finite-id-qual} Assume Assumption~\ref{ass:margin} (i.e.\ $\gamma>0$). Consider the accelerated proximal-gradient iteration \eqref{eq:apg} with any step $t\in(0,1/L]$. If $y_k\to x^\star$ in $\ell_2$, then there exists a finite index $K$ such that for all $k\ge K$, \[ x_{k+1,i}=0 \quad \text{for every } i\in I^\star. \] Equivalently, after finitely many iterations, all iterates satisfy $\supp(x_k)\subseteq S^\star$. \end{theorem} \begin{proof} By Lemma~\ref{lem:stable-zero}, any iterate $y$ in the ball $\{y:\norm{y-x^\star}_2\le r\}$ (with $r$ as in \eqref{eq:radius}) produces a proximal-gradient update $x^+$ with $x^+_{I^\star}=0$. Since $y_k\to x^\star$, there exists $K$ such that $\norm{y_k-x^\star}_2\le r$ for all $k\ge K$. Applying Lemma~\ref{lem:stable-zero} at each such $k$ yields $x_{k+1,I^\star}=0$ for all $k\ge K$. \end{proof} \subsection{An explicit identification-time bound using a standard accelerated rate} To make Theorem~\ref{thm:finite-id-qual} quantitative, we combine it with a standard accelerated convergence rate for strongly convex composite minimization. We state the rate as an assumption/lemma; this is ``standard accelerated algorithms'' content. \begin{lemma}[Standard accelerated rate (strongly convex composite)]\label{lem:apg-rate} Assume $F=f+g$ is $\mu$-strongly convex and $\nabla f$ is $L$-Lipschitz. Consider an accelerated proximal-gradient scheme \eqref{eq:apg} with step $t=1/L$ and a strongly-convex momentum choice (e.g.\ constant $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$). Then there exist constants $C>0$ and $q\in(0,1)$ such that \begin{equation}\label{eq:rate-gap} F(x_k)-F(x^\star) \le C q^k \qquad \text{for all }k\ge 0, \end{equation} where one may take $q=1-\sqrt{\mu/L}$ (up to standard constant-factor variants). \end{lemma} \begin{remark} Lemma~\ref{lem:apg-rate} is a standard result for accelerated proximal-gradient methods on $\mu$-strongly convex composite objectives. It is proved, for example, via estimate sequences or by analyzing a suitable Lyapunov/potential function. The precise constant $C$ is not important for the identification argument; only geometric decay is used. \end{remark} \begin{theorem}[Identification time bound]\label{thm:finite-id-quant} Assume Assumption~\ref{ass:margin} holds with margin $\gamma>0$, and suppose the accelerated method satisfies Lemma~\ref{lem:apg-rate} with parameters $(C,q)$ and step $t=1/L$. Let $\mu$ be the strong convexity parameter of $F$. Define \[ r := \frac{t\gamma}{1+tL} = \frac{\gamma}{2L}. \] Then for every $k\ge 1$, \[ \norm{y_k-x^\star}_2 \le \sqrt{\frac{2C}{\mu}}\; q^{\frac{k-1}{2}} \left( (1+\beta)\sqrt{q} + \beta \right), \] and therefore it suffices that \begin{equation}\label{eq:K-bound} k \;\ge\; 1 + \frac{2\log\!\left(\displaystyle \sqrt{\frac{2C}{\mu}}\, \frac{( (1+\beta)\sqrt{q} + \beta )}{r} \right)}{\log(1/q^{-1})} \end{equation} to guarantee $x_{k+1,i}=0$ for all $i\in I^\star$. In particular, the identification time is $O\!\left(\sqrt{\frac{L}{\mu}}\log\frac{1}{\gamma}\right)$ up to problem-dependent constants. \end{theorem} \begin{proof} From $\mu$-strong convexity of $F$ we have, for all $x$, \[ F(x)-F(x^\star) \ge \frac{\mu}{2}\norm{x-x^\star}_2^2, \] hence \eqref{eq:rate-gap} implies \[ \norm{x_k-x^\star}_2 \le \sqrt{\frac{2C}{\mu}}\, q^{k/2}. \] Now bound $y_k$ using the definition $y_k=x_k+\beta(x_k-x_{k-1})$: \begin{align*} \norm{y_k-x^\star}_2 &= \norm{(1+\beta)(x_k-x^\star) - \beta(x_{k-1}-x^\star)}_2 \\ &\le (1+\beta)\norm{x_k-x^\star}_2 + \beta\norm{x_{k-1}-x^\star}_2 \\ &\le \sqrt{\frac{2C}{\mu}}\left( (1+\beta)q^{k/2} + \beta q^{(k-1)/2}\right) \\ &= \sqrt{\frac{2C}{\mu}}\; q^{(k-1)/2} \left( (1+\beta)\sqrt{q} + \beta\right). \end{align*} If the right-hand side is at most $r$, then Lemma~\ref{lem:stable-zero} applies and gives $x_{k+1,I^\star}=0$. Solving for $k$ yields \eqref{eq:K-bound}. The stated big-$O$ form follows from the standard choice $q\approx 1-\sqrt{\mu/L}$, for which $\log(1/q^{-1}) \asymp \sqrt{\mu/L}$. \end{proof} \section{Specialization to $\ell_1$-regularized PageRank} For the PageRank objective \eqref{eq:f-def}--\eqref{eq:g-def} we have $L\le 1$ and $\mu=\alpha$. Taking $t=1$ (valid since $L\le 1$), the stability radius in Lemma~\ref{lem:stable-zero} becomes \[ r = \frac{\gamma}{1+L}\ge \frac{\gamma}{2}. \] Thus, once an accelerated method produces extrapolated points $y_k$ within distance $\gamma/2$ of $x^\star$, the inactive coordinates $I^\star=\{i:x_i^\star=0\}$ are never activated again. \begin{remark}[What this does and does not prove] Theorem~\ref{thm:finite-id-qual}--\ref{thm:finite-id-quant} provide a clean way to control \emph{eventual} sparsity using a \emph{margin parameter} $\gamma$. However: \begin{itemize} \item The margin $\gamma$ can be arbitrarily small (or zero) in worst-case instances, so this does not by itself yield a worst-case runtime bound depending only on $(\alpha,\rho)$. \item The results do not bound how many coordinates might become active during an initial transient phase (before the iterates enter the $\gamma$-neighborhood). \end{itemize} Nonetheless, under a positive margin, the accelerated method identifies the optimal zero pattern in finite time. \end{remark} \section*{Acknowledgment of standard ingredients} The only nontrivial ingredients beyond basic convex analysis are: (i) the proximal map of the weighted $\ell_1$ norm (Lemma~\ref{lem:prox}), (ii) Lipschitz continuity of $\nabla f$, and (iii) a standard accelerated linear convergence rate for strongly convex composite minimization (Lemma~\ref{lem:apg-rate}). The identification argument itself (Lemma~\ref{lem:stable-zero} and Theorems~\ref{thm:finite-id-qual}--\ref{thm:finite-id-quant}) is self-contained. \end{document} </syntaxhighlight>
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)