Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693e3ce6-229c-8008-97dc-ab720cb1f95a
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== gpt-5-2-pro == === User: Here is an open problem of mine. === Here is an open problem of mine. I want you to tell if it's possible prove it using standard accelerated algorithms. Avoid solutions like that of Rubio which utilizes an expensive subproblem. === Assistant: From your COLT’22 note, the “dream” guarantee is: === From your COLT’22 note, the “dream” guarantee is: * use a standard accelerated proximal-gradient method (FISTA / linear coupling) on the ℓ1–regularized PPR objective (your eq. (3)), * keep the per-iteration work local/sparse (≈ only O(1/ρ)O(1/\rho)O(1/ρ) coordinates touched each iteration, as in the ISTA/PGD analysis), * and thereby upgrade the total runtime from O~((αρ)−1)\tilde O((\alpha\rho)^{-1})O~((αρ)−1) to O~((α ρ)−1)\tilde O((\sqrt{\alpha}\,\rho)^{-1})O~((αρ)−1). open-problem-fountoulakis22a You also point out the exact bottleneck: we already know accelerated methods reduce iteration complexity to O(1/αlog(1/ε))O(\sqrt{1/\alpha}\log(1/\varepsilon))O(1/αlog(1/ε)), but we don’t have a worst‑case upper bound on how many coordinates get activated/updated per iteration, so the only safe bound is the pessimistic O(n1/αlog(1/ε))O(n\sqrt{1/\alpha}\log(1/\varepsilon))O(n1/αlog(1/ε)). open-problem-fountoulakis22a ==== If you mean a theorem of the form ==== : then my assessment is: very unlikely without adding extra safeguards or extra assumptions. ===== The ISTA/PGD sparsity proof in this setting hinges on very special monotonicity/order properties of the non‑accelerated proximal-gradient map for these PageRank/M‑matrix quadratics (this is what lets you argue you never need to “touch” most coordinates). Your note explicitly says that analysis “does not apply to the accelerated version.” open-problem-fountoulakis22a ===== Acceleration introduces an extrapolated point (schematically yk=xk+βk(xk−xk−1)y_k=x_k+\beta_k(x_k-x_{k-1})yk=xk+βk(xk−xk−1)), which breaks exactly the kind of order-preserving / “push”-style monotonicity that local PageRank proofs exploit. Once you lose that, it becomes extremely difficult to rule out “spurious activations” of many coordinates in a single iteration, because the prox step can turn on any coordinate whose pre-prox value crosses the threshold. There’s also a broader, well-known phenomenon in composite/proximal methods: * Finite-time support (manifold) identification is typically provable only under a nondegeneracy / strict complementarity margin at the solution. * Without such a margin (degenerate cases), the active set can be “almost active” for many coordinates, and accelerated methods can oscillate around the threshold. This is not specific to PageRank; it’s a general limitation noted in the manifold-identification literature. Optimization Online<ref>{{cite web|title=Optimization Online|url=https://optimization-online.org/wp-content/uploads/2019/03/7109.pdf|publisher=Optimization Online|access-date=2025-12-19}}</ref> That matters here because your open problem is explicitly worst-case (“it is not even clear if acceleration would not lead to a worse running time complexity in the worst case”). open-problem-fountoulakis22a Worst-case families can easily include symmetric/degenerate graphs where many coordinates sit exactly on (or arbitrarily close to) the activation boundary. ==== The Martínez‑Rubio–Wirth–Pokutta COLT’23 paper essentially answers your question in the affirmative by changing the algorithmic object: they build an active-set accelerated method that maintains the invariants that vanilla acceleration breaks. ==== Concretely, their Algorithm 4 (ASPR) repeatedly: # defines the current active set S(t)S^{(t)}S(t) from negative gradient coordinates, # runs an accelerated projected GD subroutine (APGD) on the restricted subspace C(t)=span{ei:i∈S(t)}∩R≥0nC^{(t)}=\mathrm{span}\{e_i:i\in S^{(t)}\}\cap\mathbb{R}^n_{\ge 0}C(t)=span{ei:i∈S(t)}∩R≥0n, # applies a small “shrink” step x(t+1)←max{0,xˉ(t+1)−δt1}x^{(t+1)}\leftarrow \max\{0,\bar x^{(t+1)}-\delta_t \mathbf{1}\}x(t+1)←max{0,xˉ(t+1)−δt1}, # and expands S(t)S^{(t)}S(t) again. Proceedings of Machine Learning Research<ref>{{cite web|title=Proceedings of Machine Learning Research|url=https://proceedings.mlr.press/v195/martinez-rubio23b/martinez-rubio23b.pdf|publisher=Proceedings of Machine Learning Research|access-date=2025-12-19}}</ref> That APGD call is exactly the “expensive subproblem” you’re referring to. It’s not “vanilla FISTA on the original full problem”; it’s accelerated optimization on an evolving restricted space, with additional machinery to guarantee S(t)⊆S\''S^{(t)}\subseteq S^\''S(t)⊆S\'' and x(t)≤x\''x^{(t)}\le x^\''x(t)≤x\'' coordinatewise. They explicitly highlight that this is what makes provable sparse updates possible and that this answers your COLT’22 open question. Proceedings of Machine Learning Research<ref>{{cite web|title=Proceedings of Machine Learning Research|url=https://proceedings.mlr.press/v195/martinez-rubio23b/martinez-rubio23b.pdf|publisher=Proceedings of Machine Learning Research|access-date=2025-12-19}}</ref> So the existence of Rubio’s solution is also evidence that the missing ingredient is not a standard acceleration rate proof—it’s the active-set control that keeps iterates from “lighting up” irrelevant coordinates. ==== If you relax your goal slightly, there is a more “standard” line of argument you could potentially formalize: ==== ===== Assume a margin at the optimum that separates active from inactive coordinates (a strict complementarity / nondegeneracy condition). In ℓ1 language this is the usual requirement that, for inactive coordinates, the optimality condition is strict (not tight). ===== Then the standard identification results imply that after finitely many iterations the algorithm identifies the correct manifold/support, after which the method behaves like an accelerated method on a fixed low-dimensional subspace (size ∣S\''∣|S^\''|∣S\*∣, which is ≈ O(1/ρ)O(1/\rho)O(1/ρ) in your application regime). The catch is: * the identification time bound typically depends on the margin parameter (often called something like δmin\delta_{\min}δmin), and in degenerate instances δmin\delta_{\min}δmin can be arbitrarily small or zero. Optimization Online<ref>{{cite web|title=Optimization Online|url=https://optimization-online.org/wp-content/uploads/2019/03/7109.pdf|publisher=Optimization Online|access-date=2025-12-19}}</ref> * so you don’t get a clean worst-case O~((αρ)−1)\tilde O((\sqrt{\alpha}\rho)^{-1})O~((αρ)−1) bound purely in terms of α,ρ\alpha,\rhoα,ρ. In other words, you can likely get something like time ≲ 1ρ polylog (1δmin)⏟identification phase + 1ραlog (1ε)⏟accelerated phase on S\*,\text{time} \;\lesssim\; \underbrace{\frac{1}{\rho}\,\mathrm{polylog}\!\Big(\frac{1}{\delta_{\min}}\Big)}_{\text{identification phase}} \;+\; \underbrace{\frac{1}{\rho\sqrt{\alpha}}\log\!\Big(\frac{1}{\varepsilon}\Big)}_{\text{accelerated phase on }S^\''},time≲identification phaseρ1polylog(δmin1)+accelerated phase on S\''ρα1log(ε1), but the δmin\delta_{\min}δmin-dependence is the part that prevents a clean worst-case theorem. ==== - If you want a worst‑case guarantee over all graphs/instances of the kind stated in your open problem—i.e., a universal δ\deltaδ such that plain FISTA/linear coupling updates ≤δ/ρ\le \delta/\rho≤δ/ρ coordinates each iteration—then I don’t see a path via “standard acceleration analysis” alone, because the whole difficulty is controlling the extrapolated iterates’ active set, and generic proximal identification theory tells you you cannot control that uniformly without a nondegeneracy margin. ==== * The only provable acceleration-with-sparsity result I’m aware of in this line is essentially active-set acceleration (Rubio et al.), which introduces exactly the kind of “subproblem” / restricted-space solves you want to avoid. Proceedings of Machine Learning Research<ref>{{cite web|title=Proceedings of Machine Learning Research|url=https://proceedings.mlr.press/v195/martinez-rubio23b/martinez-rubio23b.pdf|publisher=Proceedings of Machine Learning Research|access-date=2025-12-19}}</ref> ===== The most plausible compromise (theorem-wise) is: ===== # prove a finite identification result for your PageRank instance class (needs a margin condition or a probabilistic model), then # run a single accelerated method on the identified support. This avoids repeated expensive restricted solves, but it won’t give a clean worst-case O~((αρ)−1)\tilde O((\sqrt{\alpha}\rho)^{-1})O~((αρ)−1) unless you can lower-bound the margin in terms of ρ\rhoρ for all graphs—which is exactly the kind of statement that seems false in worst case (symmetries/degeneracy). If you tell me which “standard accelerated algorithm” you have in mind (plain FISTA, strongly-convex APG with fixed β\betaβ, linear coupling variant, etc.), I can be more concrete about which invariants one would need to establish (e.g., order-bounds 0≤yk≤c x\''0\le y_k\le c\,x^\''0≤yk≤cx\*, or a per-iteration certificate that prevents activation outside a small frontier), and where the known theory breaks. === User: Provide a self-contained proof which uses the idea of the margin parameter. === Provide a self-contained proof which uses the idea of the margin parameter. Provide a self-contained latex document. === Assistant: Below is a self-contained LaTeX note that proves a finite active-set identification result for standard accelerated pro… === Below is a self-contained LaTeX note that proves a finite active-set identification result for standard accelerated proximal gradient (APG / FISTA-style momentum) on the ℓ1\ell_1ℓ1-regularized PageRank objective (3) from your COLT’22 open problem note open-problem-fountoulakis22a , under a strict-complementarity “margin” assumption. The proof shows that once the accelerated iterates are close enough to the optimum (with a radius proportional to the margin), no coordinates outside the optimal support can ever be activated again; it also gives an explicit bound on the identification time in terms of the margin and the accelerated convergence rate. <syntaxhighlight lang="latex">\documentclass[11pt]{article} \usepackage{amsmath,amssymb,amsthm,mathtools} \usepackage[margin=1in]{geometry} \usepackage{hyperref} % -------------------- % Theorem environments % -------------------- \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \newtheorem{corollary}{Corollary} \newtheorem{assumption}{Assumption} \theoremstyle{remark} \newtheorem{remark}{Remark} % -------------------- % Macros % -------------------- \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\ip}[2]{\left\langle #1,#2 \right\rangle} \newcommand{\prox}{\operatorname{prox}} \newcommand{\supp}{\operatorname{supp}} \newcommand{\diag}{\operatorname{diag}} \title{Finite Active-Set Identification for Accelerated Proximal Gradient on $\ell_1$-Regularized PageRank Under a Margin Condition} \author{} \date{} \begin{document} \maketitle \begin{abstract} We consider the $\ell_1$-regularized PageRank objective \[ \min_{x\in\R^n}\; F(x) \;:=\; f(x) + g(x), \qquad g(x) := \rho\alpha \norm{D^{1/2}x}_1, \] where $f$ is the smooth quadratic arising from personalized PageRank. We prove a \emph{finite identification} statement for standard accelerated proximal-gradient methods: under a strict-complementarity \emph{margin} condition at the unique minimizer $x^\star$, all coordinates outside $\supp(x^\star)$ become and remain identically zero after finitely many iterations. An explicit identification-time bound is given in terms of the margin and the accelerated convergence rate. \end{abstract} \section{Problem setup: $\ell_1$-regularized PageRank as a composite strongly convex problem} Let $G=(V,E)$ be an undirected graph with $n=|V|$ nodes. Let $A\in\R^{n\times n}$ denote the adjacency matrix and $D=\diag(d_1,\dots,d_n)$ the degree matrix. Fix $\alpha\in(0,1]$ and a seed vector $s\in\R^n$ (e.g.\ an indicator of a seed node). Define the (symmetric) normalized Laplacian \[ L := I - D^{-1/2} A D^{-1/2}. \] It is standard that $L$ is positive semidefinite and $\lambda(L)\subseteq[0,2]$ for undirected graphs. Following the $\ell_1$-regularized PageRank formulation, define the matrix \[ Q := \alpha I + \frac{1-\alpha}{2}L, \] and the smooth function \begin{equation}\label{eq:f-def} f(x) := \frac12 x^\top Q x - \alpha\, x^\top D^{-1/2}s. \end{equation} Let the nonsmooth regularizer be \begin{equation}\label{eq:g-def} g(x) := \rho\alpha\, \norm{D^{1/2}x}_1 = \sum_{i=1}^n \lambda_i |x_i|, \qquad \lambda_i := \rho\alpha\sqrt{d_i}. \end{equation} We study \begin{equation}\label{eq:main-problem} \min_{x\in\R^n} F(x) := f(x)+g(x). \end{equation} \subsection{Smoothness and strong convexity constants for $f$ and $F$} Since $\nabla^2 f(x)=Q$ is constant, we can read smoothness/strong convexity from the spectrum of $Q$. Because $\lambda(L)\subseteq[0,2]$, we have \[ \lambda(Q) = \left\{\alpha + \frac{1-\alpha}{2}\lambda: \lambda\in\lambda(L)\right\} \subseteq [\alpha,\, \alpha+(1-\alpha)] = [\alpha,1]. \] Therefore: \begin{itemize} \item $\nabla f$ is $L_f$-Lipschitz with $L_f=\norm{Q}_2\le 1$. \item $f$ is $\mu$-strongly convex with $\mu=\lambda_{\min}(Q)\ge \alpha$. \end{itemize} In what follows we write $L:=L_f$ and $\mu:=\alpha$ for simplicity; the argument works for general $L,\mu$. Since $g$ is convex, the sum $F=f+g$ is also $\mu$-strongly convex. Hence $F$ has a unique minimizer $x^\star$. \section{Prox operator and optimality conditions} \subsection{Weighted soft-thresholding proximal operator} Because $g(x)=\sum_i \lambda_i|x_i|$ is separable, its proximal operator is coordinatewise. \begin{lemma}[Weighted soft-thresholding]\label{lem:prox} Let $t>0$. For any $u\in\R^n$, \[ \prox_{t g}(u)_i \;=\; \operatorname{sign}(u_i)\,\max\{ |u_i| - t\lambda_i,\; 0\}, \qquad i=1,\dots,n. \] In particular, $\prox_{t g}(u)_i=0$ if and only if $|u_i|\le t\lambda_i$. \end{lemma} \begin{proof} This is standard: minimize $\frac{1}{2t}\norm{x-u}_2^2 + \sum_i \lambda_i|x_i|$, which separates over coordinates. For each coordinate one obtains the soft-thresholding solution with threshold $t\lambda_i$. \end{proof} \subsection{KKT conditions and the active/inactive sets} Let $x^\star$ denote the unique minimizer of \eqref{eq:main-problem}. The first-order optimality condition is \[ 0 \in \nabla f(x^\star) + \partial g(x^\star). \] Coordinatewise, since $\partial|x_i|=\{\operatorname{sign}(x_i)\}$ for $x_i\neq 0$ and $\partial|0|=[-1,1]$, we obtain: \begin{lemma}[Coordinatewise optimality conditions]\label{lem:kkt} For each $i=1,\dots,n$: \begin{itemize} \item If $x_i^\star\neq 0$, then $\nabla f(x^\star)_i + \lambda_i\,\operatorname{sign}(x_i^\star)=0$. Equivalently, $|\nabla f(x^\star)_i|=\lambda_i$. \item If $x_i^\star=0$, then $|\nabla f(x^\star)_i|\le \lambda_i$. \end{itemize} \end{lemma} Define the \emph{active set} $S^\star := \supp(x^\star)=\{i:x_i^\star\neq 0\}$ and the \emph{inactive set} $I^\star := \{1,\dots,n\}\setminus S^\star = \{i:x_i^\star=0\}$. \section{A margin (strict complementarity) assumption} The key idea is to assume that the inactive coordinates satisfy \emph{strict} inequality in the KKT condition. \begin{assumption}[Margin / strict complementarity]\label{ass:margin} Define the margin parameter \begin{equation}\label{eq:margin} \gamma \;:=\; \min_{i\in I^\star}\bigl(\lambda_i - |\nabla f(x^\star)_i|\bigr). \end{equation} Assume $\gamma>0$. Equivalently, for every inactive coordinate $i\in I^\star$, \[ |\nabla f(x^\star)_i|\le \lambda_i - \gamma. \] \end{assumption} \begin{remark} This assumption is standard in manifold-identification results for $\ell_1$-type regularizers. It may fail (i.e.\ $\gamma=0$) in degenerate instances; without a positive margin, one cannot in general guarantee finite-time identification. \end{remark} \section{Accelerated proximal-gradient method} Fix a step size $t\in(0,1/L]$. Consider the standard accelerated proximal-gradient iteration (Nesterov / FISTA-style momentum): \begin{equation}\label{eq:apg} \begin{aligned} y_k &= x_k + \beta_k (x_k - x_{k-1}),\\ x_{k+1} &= \prox_{t g}\bigl( y_k - t \nabla f(y_k)\bigr), \end{aligned} \qquad k\ge 0, \end{equation} with initialization $x_{-1}=x_0$ (so $y_0=x_0$). For strongly convex objectives one often uses a constant $\beta_k\equiv\beta\in[0,1)$, e.g.\ $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$, but the identification argument below only needs that $y_k\to x^\star$ and (for explicit bounds) a known convergence rate. \section{Key stability lemma: the margin prevents activation of inactive coordinates} \begin{lemma}[Inactivity is stable in a neighborhood]\label{lem:stable-zero} Assume $f$ has $L$-Lipschitz gradient in $\ell_2$, and Assumption~\ref{ass:margin} holds. Let $t\in(0,1/L]$ and define the radius \begin{equation}\label{eq:radius} r \;:=\; \frac{t\gamma}{1+tL}. \end{equation} Then for any $y\in\R^n$ satisfying $\norm{y-x^\star}_2 \le r$, the proximal-gradient update \[ x^+ := \prox_{t g}\bigl(y - t\nabla f(y)\bigr) \] satisfies $x^+_i = 0$ for all $i\in I^\star$. \end{lemma} \begin{proof} Fix an inactive coordinate $i\in I^\star$, so $x_i^\star=0$. Let \[ u := y - t\nabla f(y). \] By Lemma~\ref{lem:prox}, it suffices to show $|u_i|\le t\lambda_i$. Write \[ u_i = (y_i-x_i^\star) - t\bigl(\nabla f(y)_i - \nabla f(x^\star)_i\bigr) \;-\; t\nabla f(x^\star)_i. \] Using the triangle inequality and $x_i^\star=0$, \begin{align*} |u_i| &\le |y_i-x_i^\star| + t|\nabla f(y)_i - \nabla f(x^\star)_i| + t|\nabla f(x^\star)_i| \\ &\le \norm{y-x^\star}_2 + t\norm{\nabla f(y)-\nabla f(x^\star)}_2 + t|\nabla f(x^\star)_i|. \end{align*} Since $\nabla f$ is $L$-Lipschitz in $\ell_2$, $\norm{\nabla f(y)-\nabla f(x^\star)}_2 \le L\norm{y-x^\star}_2$, hence \[ |u_i|\le (1+tL)\norm{y-x^\star}_2 + t|\nabla f(x^\star)_i|. \] By the margin assumption, $|\nabla f(x^\star)_i|\le \lambda_i - \gamma$, so \[ |u_i|\le (1+tL)\norm{y-x^\star}_2 + t(\lambda_i-\gamma). \] If $\norm{y-x^\star}_2\le r=\frac{t\gamma}{1+tL}$, then \[ |u_i|\le (1+tL)\cdot \frac{t\gamma}{1+tL} + t(\lambda_i-\gamma) = t\lambda_i. \] Therefore $|u_i|\le t\lambda_i$, and Lemma~\ref{lem:prox} gives $x^+_i=0$. Since $i\in I^\star$ was arbitrary, $x^+_{I^\star}=0$. \end{proof} \section{Main result: finite identification for accelerated proximal gradient} \subsection{Existence of a finite identification time} \begin{theorem}[Finite active-set identification (qualitative)]\label{thm:finite-id-qual} Assume Assumption~\ref{ass:margin} (i.e.\ $\gamma>0$). Consider the accelerated proximal-gradient iteration \eqref{eq:apg} with any step $t\in(0,1/L]$. If $y_k\to x^\star$ in $\ell_2$, then there exists a finite index $K$ such that for all $k\ge K$, \[ x_{k+1,i}=0 \quad \text{for every } i\in I^\star. \] Equivalently, after finitely many iterations, all iterates satisfy $\supp(x_k)\subseteq S^\star$. \end{theorem} \begin{proof} By Lemma~\ref{lem:stable-zero}, any iterate $y$ in the ball $\{y:\norm{y-x^\star}_2\le r\}$ (with $r$ as in \eqref{eq:radius}) produces a proximal-gradient update $x^+$ with $x^+_{I^\star}=0$. Since $y_k\to x^\star$, there exists $K$ such that $\norm{y_k-x^\star}_2\le r$ for all $k\ge K$. Applying Lemma~\ref{lem:stable-zero} at each such $k$ yields $x_{k+1,I^\star}=0$ for all $k\ge K$. \end{proof} \subsection{An explicit identification-time bound using a standard accelerated rate} To make Theorem~\ref{thm:finite-id-qual} quantitative, we combine it with a standard accelerated convergence rate for strongly convex composite minimization. We state the rate as an assumption/lemma; this is ``standard accelerated algorithms'' content. \begin{lemma}[Standard accelerated rate (strongly convex composite)]\label{lem:apg-rate} Assume $F=f+g$ is $\mu$-strongly convex and $\nabla f$ is $L$-Lipschitz. Consider an accelerated proximal-gradient scheme \eqref{eq:apg} with step $t=1/L$ and a strongly-convex momentum choice (e.g.\ constant $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$). Then there exist constants $C>0$ and $q\in(0,1)$ such that \begin{equation}\label{eq:rate-gap} F(x_k)-F(x^\star) \le C q^k \qquad \text{for all }k\ge 0, \end{equation} where one may take $q=1-\sqrt{\mu/L}$ (up to standard constant-factor variants). \end{lemma} \begin{remark} Lemma~\ref{lem:apg-rate} is a standard result for accelerated proximal-gradient methods on $\mu$-strongly convex composite objectives. It is proved, for example, via estimate sequences or by analyzing a suitable Lyapunov/potential function. The precise constant $C$ is not important for the identification argument; only geometric decay is used. \end{remark} \begin{theorem}[Identification time bound]\label{thm:finite-id-quant} Assume Assumption~\ref{ass:margin} holds with margin $\gamma>0$, and suppose the accelerated method satisfies Lemma~\ref{lem:apg-rate} with parameters $(C,q)$ and step $t=1/L$. Let $\mu$ be the strong convexity parameter of $F$. Define \[ r := \frac{t\gamma}{1+tL} = \frac{\gamma}{2L}. \] Then for every $k\ge 1$, \[ \norm{y_k-x^\star}_2 \le \sqrt{\frac{2C}{\mu}}\; q^{\frac{k-1}{2}} \left( (1+\beta)\sqrt{q} + \beta \right), \] and therefore it suffices that \begin{equation}\label{eq:K-bound} k \;\ge\; 1 + \frac{2\log\!\left(\displaystyle \sqrt{\frac{2C}{\mu}}\, \frac{( (1+\beta)\sqrt{q} + \beta )}{r} \right)}{\log(1/q^{-1})} \end{equation} to guarantee $x_{k+1,i}=0$ for all $i\in I^\star$. In particular, the identification time is $O\!\left(\sqrt{\frac{L}{\mu}}\log\frac{1}{\gamma}\right)$ up to problem-dependent constants. \end{theorem} \begin{proof} From $\mu$-strong convexity of $F$ we have, for all $x$, \[ F(x)-F(x^\star) \ge \frac{\mu}{2}\norm{x-x^\star}_2^2, \] hence \eqref{eq:rate-gap} implies \[ \norm{x_k-x^\star}_2 \le \sqrt{\frac{2C}{\mu}}\, q^{k/2}. \] Now bound $y_k$ using the definition $y_k=x_k+\beta(x_k-x_{k-1})$: \begin{align*} \norm{y_k-x^\star}_2 &= \norm{(1+\beta)(x_k-x^\star) - \beta(x_{k-1}-x^\star)}_2 \\ &\le (1+\beta)\norm{x_k-x^\star}_2 + \beta\norm{x_{k-1}-x^\star}_2 \\ &\le \sqrt{\frac{2C}{\mu}}\left( (1+\beta)q^{k/2} + \beta q^{(k-1)/2}\right) \\ &= \sqrt{\frac{2C}{\mu}}\; q^{(k-1)/2} \left( (1+\beta)\sqrt{q} + \beta\right). \end{align*} If the right-hand side is at most $r$, then Lemma~\ref{lem:stable-zero} applies and gives $x_{k+1,I^\star}=0$. Solving for $k$ yields \eqref{eq:K-bound}. The stated big-$O$ form follows from the standard choice $q\approx 1-\sqrt{\mu/L}$, for which $\log(1/q^{-1}) \asymp \sqrt{\mu/L}$. \end{proof} \section{Specialization to $\ell_1$-regularized PageRank} For the PageRank objective \eqref{eq:f-def}--\eqref{eq:g-def} we have $L\le 1$ and $\mu=\alpha$. Taking $t=1$ (valid since $L\le 1$), the stability radius in Lemma~\ref{lem:stable-zero} becomes \[ r = \frac{\gamma}{1+L}\ge \frac{\gamma}{2}. \] Thus, once an accelerated method produces extrapolated points $y_k$ within distance $\gamma/2$ of $x^\star$, the inactive coordinates $I^\star=\{i:x_i^\star=0\}$ are never activated again. \begin{remark}[What this does and does not prove] Theorem~\ref{thm:finite-id-qual}--\ref{thm:finite-id-quant} provide a clean way to control \emph{eventual} sparsity using a \emph{margin parameter} $\gamma$. However: \begin{itemize} \item The margin $\gamma$ can be arbitrarily small (or zero) in worst-case instances, so this does not by itself yield a worst-case runtime bound depending only on $(\alpha,\rho)$. \item The results do not bound how many coordinates might become active during an initial transient phase (before the iterates enter the $\gamma$-neighborhood). \end{itemize} Nonetheless, under a positive margin, the accelerated method identifies the optimal zero pattern in finite time. \end{remark} \section*{Acknowledgment of standard ingredients} The only nontrivial ingredients beyond basic convex analysis are: (i) the proximal map of the weighted $\ell_1$ norm (Lemma~\ref{lem:prox}), (ii) Lipschitz continuity of $\nabla f$, and (iii) a standard accelerated linear convergence rate for strongly convex composite minimization (Lemma~\ref{lem:apg-rate}). The identification argument itself (Lemma~\ref{lem:stable-zero} and Theorems~\ref{thm:finite-id-qual}--\ref{thm:finite-id-quant}) is self-contained. \end{document} </syntaxhighlight> === User: Wait, you say that "\item The results do not bound how many coordinates might become active during an initial transient… === Wait, you say that "\item The results do not bound how many coordinates might become active during an initial transient phase (before the iterates enter the $\gamma$-neighborhood)." But this is very important... Is it possible to bound this? === Assistant: You’re right to flag that: for the local running time story in your open problem, what matters is not just that the “wr… === You’re right to flag that: for the local running time story in your open problem, what matters is not just that the “wrong” coordinates eventually die, but how big the support can get before identification. open-problem-fountoulakis22a There are two different answers depending on what kind of bound you mean: # A bound purely from the margin γ\gammaγ, independent of how far you are from x⋆x^\starx⋆ # A bound that uses γ\gammaγ ''and'' a measure of how far yky_kyk is from x⋆x^\starx⋆ (distance or objective gap) ===== Not in the strong sense you would want (e.g., “≤c/ρ\le c/\rho≤c/ρ active coordinates for all kkk”). ===== Reason: the margin is a local property at the solution: ∣∇f(x⋆)i∣≤λi−γ(i∈I⋆),|\nabla f(x^\star)_i| \le \lambda_i - \gamma\qquad (i\in I^\star),∣∇f(x⋆)i∣≤λi−γ(i∈I⋆), so it only says those coordinates are safely inside the “dead zone” near x⋆x^\starx⋆. Far from x⋆x^\starx⋆, nothing stops the prox input from crossing the threshold on many coordinates. Also, the literature on identification for inertial/accelerated forward–backward methods explicitly discusses that FISTA/inertial schemes can oscillate around the active manifold before settling, so one should not expect a clean monotone “active set only grows nicely” picture without extra structure. arXiv<ref>{{cite web|title=arXiv|url=https://arxiv.org/pdf/1503.03703|publisher=arxiv.org|access-date=2025-12-19}}</ref> So: a uniform, instance-independent “small support for all transient iterates” bound does not follow from γ>0\gamma>0γ>0 alone. ==== Even though γ\gammaγ is local, it does give a very clean quantitative bound of the form ==== : This is already useful because accelerated methods give you an explicit decay of ∥yk−x⋆∥\|y_k-x^\star\|∥yk−x⋆∥ (or of the objective gap), so you can turn it into a bound on transient “wrong support” size that shrinks over time. ===== Consider the composite problem ===== minx F(x)=f(x)+g(x),g(x)=∑i=1nλi∣xi∣,\min_x\;F(x)=f(x)+g(x),\qquad g(x)=\sum_{i=1}^n \lambda_i |x_i|,xminF(x)=f(x)+g(x),g(x)=i=1∑nλi∣xi∣, with ∇f\nabla f∇f LLL-Lipschitz (in ℓ2\ell_2ℓ2). Let x⋆x^\starx⋆ be the minimizer and define the inactive set I⋆={i:xi⋆=0}I^\star=\{i:x_i^\star=0\}I⋆={i:xi⋆=0}. Assume the margin γ:=mini∈I⋆(λi−∣∇f(x⋆)i∣)>0.\gamma:=\min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr)>0.γ:=i∈I⋆min(λi−∣∇f(x⋆)i∣)>0. Fix a step t>0t>0t>0, and for any point yyy define the prox-gradient input u(y):=y−t∇f(y),x+(y):=proxtg(u(y)).u(y):=y-t\nabla f(y),\qquad x^+(y):=\operatorname{prox}_{t g}(u(y)).u(y):=y−t∇f(y),x+(y):=proxtg(u(y)). Let the “spurious active set” at yyy be A(y):={i∈I⋆: x+(y)i≠0}.A(y):=\{i\in I^\star:\ x^+(y)_i\neq 0\}.A(y):={i∈I⋆: x+(y)i=0}. Then ∣A(y)∣ ≤ ((1+tL) ∥y−x⋆∥2tγ)2.|A(y)|\ \le\ \left(\frac{(1+tL)\,\|y-x^\star\|_2}{t\gamma}\right)^2.∣A(y)∣ ≤ (tγ(1+tL)∥y−x⋆∥2)2. ====== For weighted soft-thresholding, x+(y)i≠0x^+(y)_i\neq 0x+(y)i=0 iff ∣u(y)i∣>tλi|u(y)_i|>t\lambda_i∣u(y)i∣>tλi. ====== For i∈I⋆i\in I^\stari∈I⋆, we have xi⋆=0x^\star_i=0xi⋆=0 and ∣∇f(x⋆)i∣≤λi−γ|\nabla f(x^\star)_i|\le \lambda_i-\gamma∣∇f(x⋆)i∣≤λi−γ, so ∣u(x⋆)i∣=∣xi⋆−t∇f(x⋆)i∣=t∣∇f(x⋆)i∣≤t(λi−γ).|u(x^\star)_i|=|x^\star_i-t\nabla f(x^\star)_i|=t|\nabla f(x^\star)_i|\le t(\lambda_i-\gamma).∣u(x⋆)i∣=∣xi⋆−t∇f(x⋆)i∣=t∣∇f(x⋆)i∣≤t(λi−γ). Hence, if i∈A(y)i\in A(y)i∈A(y), then ∣u(y)i∣>tλi|u(y)_i|>t\lambda_i∣u(y)i∣>tλi, and therefore ∣u(y)i∣−∣u(x⋆)i∣>tλi−t(λi−γ)=tγ.|u(y)_i|-|u(x^\star)_i|>t\lambda_i - t(\lambda_i-\gamma)=t\gamma.∣u(y)i∣−∣u(x⋆)i∣>tλi−t(λi−γ)=tγ. Using ∣ ∣a∣−∣b∣ ∣≤∣a−b∣|\,|a|-|b|\,|\le |a-b|∣∣a∣−∣b∣∣≤∣a−b∣, we get ∣u(y)i−u(x⋆)i∣>tγ|u(y)_i-u(x^\star)_i|>t\gamma∣u(y)i−u(x⋆)i∣>tγ. Thus ∥u(y)−u(x⋆)∥22 ≥ ∑i∈A(y)∣u(y)i−u(x⋆)i∣2 > ∣A(y)∣ (tγ)2.\|u(y)-u(x^\star)\|_2^2\ \ge\ \sum_{i\in A(y)} |u(y)_i-u(x^\star)_i|^2\ >\ |A(y)|\,(t\gamma)^2.∥u(y)−u(x⋆)∥22 ≥ i∈A(y)∑∣u(y)i−u(x⋆)i∣2 > ∣A(y)∣(tγ)2. On the other hand, since ∇f\nabla f∇f is LLL-Lipschitz, ∥u(y)−u(x⋆)∥2=∥(y−x⋆)−t(∇f(y)−∇f(x⋆))∥2≤(1+tL)∥y−x⋆∥2.\|u(y)-u(x^\star)\|_2 =\|(y-x^\star)-t(\nabla f(y)-\nabla f(x^\star))\|_2 \le (1+tL)\|y-x^\star\|_2.∥u(y)−u(x⋆)∥2=∥(y−x⋆)−t(∇f(y)−∇f(x⋆))∥2≤(1+tL)∥y−x⋆∥2. Combine the two inequalities and rearrange to obtain the claim. ∎ ==== In an accelerated prox-gradient method, you compute ==== xk+1=proxtg(yk−t∇f(yk)).x_{k+1}=\operatorname{prox}_{t g}(y_k-t\nabla f(y_k)).xk+1=proxtg(yk−t∇f(yk)). Apply the lemma with y=yky=y_ky=yk: ∣A(yk)∣ ≤ ((1+tL) ∥yk−x⋆∥2tγ)2.|A(y_k)|\ \le\ \left(\frac{(1+tL)\,\|y_k-x^\star\|_2}{t\gamma}\right)^2.∣A(yk)∣ ≤ (tγ(1+tL)∥yk−x⋆∥2)2. So you can bound the number of “wrong” active coordinates at iteration kkk as soon as you have any bound on ∥yk−x⋆∥2\|y_k-x^\star\|_2∥yk−x⋆∥2 (or on an objective gap, using strong convexity). And the identification literature you’re implicitly leaning on (active-set / manifold identification) does exactly this: it bounds the “active set complexity” in terms of a margin-like parameter (often called δmin\delta_{\min}δmin) and a convergence bound εx(k)\varepsilon_x(k)εx(k). Proceedings of Machine Learning Research<ref>{{cite web|title=Proceedings of Machine Learning Research|url=https://proceedings.mlr.press/v89/sun19a/sun19a.pdf|publisher=Proceedings of Machine Learning Research|access-date=2025-12-19}}</ref> ==== Not by itself. ==== * The bound is quadratic in 1/γ1/\gamma1/γ, not linear in 1/ρ1/\rho1/ρ. * To turn it into a “local algorithm” guarantee, you would need to relate γ\gammaγ to ρ\rhoρ (and control ∥yk−x⋆∥\|y_k-x^\star\|∥yk−x⋆∥ by something that doesn’t hide an nnn-dependence). In worst case, γ\gammaγ can be arbitrarily small even when ∣S⋆∣|S^\star|∣S⋆∣ is O(1/ρ)O(1/\rho)O(1/ρ), so the lemma allows very large transient supports. So the best honest summary is: * Yes, you can bound transient spurious activations using a margin parameter, but the natural bound you can prove from standard smoothness + margin is of the form ∣Ak∣≲∥yk−x⋆∥2/γ2,|A_k| \lesssim \|y_k-x^\star\|^2/\gamma^2,∣Ak∣≲∥yk−x⋆∥2/γ2, which is not the O~(1/ρ)\tilde O(1/\rho)O~(1/ρ) sparsity you ultimately want. * Getting a uniform O~(1/ρ)\tilde O(1/\rho)O~(1/ρ) per-iteration support bound for acceleration typically needs extra invariants/controls (active-set control, monotonicity/restarter tricks, etc.), which is exactly where the “nonstandard” machinery comes in—consistent with why the identification papers emphasize oscillations/inertial effects. arXiv<ref>{{cite web|title=arXiv|url=https://arxiv.org/pdf/1503.03703|publisher=arxiv.org|access-date=2025-12-19}}</ref> ### === User: I want a bound on the sum of supports over the transient. === I want a bound on the sum of supports over the transient. You can assume that the initial solution is the zero vector. Provide self-contained latex code. === Assistant: Below is a self-contained LaTeX document that proves a bound on the sum of iterate supports over the transient pre-iden… === Below is a self-contained LaTeX document that proves a bound on the sum of iterate supports over the transient (pre-identification) phase, using a margin/strict-complementarity parameter for the ℓ1\ell_1ℓ1-regularized PageRank objective (3) from your open-problem note. open-problem-fountoulakis22a <syntaxhighlight lang="latex">\documentclass[11pt]{article} \usepackage{amsmath,amssymb,amsthm,mathtools} \usepackage[margin=1in]{geometry} \usepackage{hyperref} % -------------------- % Theorem environments % -------------------- \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \newtheorem{corollary}{Corollary} \newtheorem{assumption}{Assumption} \theoremstyle{remark} \newtheorem{remark}{Remark} % -------------------- % Macros % -------------------- \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\ip}[2]{\left\langle #1,#2 \right\rangle} \newcommand{\prox}{\operatorname{prox}} \newcommand{\supp}{\operatorname{supp}} \newcommand{\diag}{\operatorname{diag}} \title{Bounding the Sum of Supports in the Transient of Accelerated Proximal Gradient via a Margin Parameter} \author{} \date{} \begin{document} \maketitle \begin{abstract} We study accelerated proximal-gradient methods for composite objectives of the form $F(x)=f(x)+\sum_{i=1}^n \lambda_i |x_i|$ where $f$ is smooth and strongly convex. Under a strict-complementarity \emph{margin} condition at the minimizer $x^\star$, we bound: (i) the number of \emph{spurious} active coordinates (outside $\supp(x^\star)$) at each iteration in terms of $\|y_k-x^\star\|_2$, and (ii) the \emph{sum of supports} $\sum_{k=0}^{K-1}|\supp(x_{k+1})|$ over the \emph{transient} phase before inactive coordinates are permanently identified as zero. We also specialize constants to the $\ell_1$-regularized PageRank objective (COLT'22 open problem). \end{abstract} \section{Composite model, support sets, and a margin parameter} Let \begin{equation}\label{eq:prob} \min_{x\in\R^n} F(x) := f(x) + g(x), \qquad g(x) := \sum_{i=1}^n \lambda_i |x_i|, \quad \lambda_i>0. \end{equation} Assume $f$ is differentiable, $\nabla f$ is $L$-Lipschitz in $\ell_2$, and $F$ is $\mu$-strongly convex ($\mu>0$). Then $F$ has a unique minimizer $x^\star$. Define the optimal support and its complement \[ S^\star := \supp(x^\star),\qquad I^\star := \{1,\dots,n\}\setminus S^\star. \] \subsection{Weighted soft-thresholding proximal map} \begin{lemma}[Prox of weighted $\ell_1$]\label{lem:prox} For any $t>0$ and $u\in\R^n$, \[ \prox_{t g}(u)_i = \operatorname{sign}(u_i)\,\max\{|u_i|-t\lambda_i,\,0\}. \] In particular, $\prox_{t g}(u)_i\neq 0$ if and only if $|u_i|>t\lambda_i$. \end{lemma} \begin{proof} The proximal subproblem is separable over coordinates: $\min_x \frac{1}{2t}\|x-u\|_2^2 + \sum_i \lambda_i|x_i|$, giving weighted soft-thresholding. \end{proof} \subsection{Margin / strict complementarity} First-order optimality gives $0\in\nabla f(x^\star)+\partial g(x^\star)$. In particular, for any $i\in I^\star$ (so $x_i^\star=0$), one has $|\nabla f(x^\star)_i|\le \lambda_i$. \begin{assumption}[Margin parameter]\label{ass:margin} Define \begin{equation}\label{eq:gamma} \gamma := \min_{i\in I^\star}\bigl(\lambda_i - |\nabla f(x^\star)_i|\bigr). \end{equation} Assume $\gamma>0$. Equivalently, every inactive coordinate satisfies $|\nabla f(x^\star)_i|\le \lambda_i-\gamma$. \end{assumption} \begin{remark} Assumption~\ref{ass:margin} is a standard strict-complementarity condition for $\ell_1$ regularization. Without $\gamma>0$, finite-time identification and sharp sparsity control can fail in worst case. \end{remark} \section{Accelerated proximal gradient} Fix a step size $t\in(0,1/L]$ and a momentum parameter $\beta\in[0,1)$. Consider the inertial/accelerated proximal-gradient iteration \begin{equation}\label{eq:apg} \begin{aligned} y_k &= x_k + \beta (x_k-x_{k-1}),\\ x_{k+1} &= \prox_{t g}\!\bigl(y_k - t\nabla f(y_k)\bigr), \end{aligned} \qquad k\ge 0, \end{equation} initialized with \begin{equation}\label{eq:init} x_{-1}=x_0=0. \end{equation} (Thus $y_0=x_0=0$.) Define the \emph{spurious active set at step $k$} \begin{equation}\label{eq:Ak} A_k := \supp(x_{k+1})\cap I^\star = \{i\in I^\star:\ x_{k+1,i}\neq 0\}. \end{equation} \section{Key counting lemma: spurious activations cost distance to $x^\star$} Define the forward (gradient) map input \[ u(y) := y - t\nabla f(y). \] \begin{lemma}[Each spurious active coordinate forces a $t\gamma$-jump]\label{lem:tgammastep} Assume Assumption~\ref{ass:margin}. Fix any $y\in\R^n$ and set $x^+(y)=\prox_{t g}(u(y))$. Let $A(y):=\supp(x^+(y))\cap I^\star$. Then for every $i\in A(y)$, \[ |u(y)_i-u(x^\star)_i| > t\gamma, \] and consequently \begin{equation}\label{eq:counting} |A(y)| \le \frac{\|u(y)-u(x^\star)\|_2^2}{t^2\gamma^2}. \end{equation} \end{lemma} \begin{proof} Fix $i\in I^\star$. Then $x_i^\star=0$ and $u(x^\star)_i=-t\nabla f(x^\star)_i$, so $|u(x^\star)_i|=t|\nabla f(x^\star)_i|\le t(\lambda_i-\gamma)$. If $i\in A(y)$ then $x^+(y)_i\neq 0$, hence by Lemma~\ref{lem:prox} we have $|u(y)_i|>t\lambda_i$. Therefore \[ |u(y)_i-u(x^\star)_i|\ge |u(y)_i|-|u(x^\star)_i|>t\lambda_i-t(\lambda_i-\gamma)=t\gamma. \] Summing over $i\in A(y)$ gives $\|u(y)-u(x^\star)\|_2^2>\sum_{i\in A(y)}(t\gamma)^2=|A(y)|t^2\gamma^2$, which implies \eqref{eq:counting}. \end{proof} \begin{lemma}[Lipschitz transfer to $y$-distance]\label{lem:uLip} If $\nabla f$ is $L$-Lipschitz, then for all $y$, \[ \|u(y)-u(x^\star)\|_2 \le (1+tL)\|y-x^\star\|_2. \] Hence, \begin{equation}\label{eq:Ak_vs_y} |A(y)| \le \left(\frac{1+tL}{t\gamma}\right)^2 \|y-x^\star\|_2^2. \end{equation} \end{lemma} \begin{proof} $u(y)-u(x^\star)=(y-x^\star)-t(\nabla f(y)-\nabla f(x^\star))$. Take norms and use $\|\nabla f(y)-\nabla f(x^\star)\|_2\le L\|y-x^\star\|_2$. Combine with Lemma~\ref{lem:tgammastep}. \end{proof} Applying Lemma~\ref{lem:uLip} to $y=y_k$ gives the per-iteration spurious-support bound \begin{equation}\label{eq:Ak_bound_general} |A_k| \le \left(\frac{1+tL}{t\gamma}\right)^2 \|y_k-x^\star\|_2^2. \end{equation} \section{Bounding $\|y_k-x^\star\|$ using a standard accelerated linear rate} To obtain an explicit \emph{sum} bound, we use a standard accelerated linear convergence statement. \begin{assumption}[Standard accelerated linear convergence]\label{ass:linear_rate} There exist constants $q\in(0,1)$ and $\Delta_0>0$ such that \begin{equation}\label{eq:linear_rate} F(x_k)-F(x^\star) \le \Delta_0\, q^k\qquad \text{for all }k\ge 0. \end{equation} With initialization $x_0=0$, we have $\Delta_0=F(0)-F(x^\star)$. \end{assumption} \begin{remark} For strongly convex composite objectives, a \emph{standard} accelerated proximal-gradient method with step $t=1/L$ and a constant momentum choice (e.g.\ Nesterov's $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$) achieves \eqref{eq:linear_rate} with $q\approx 1-\sqrt{\mu/L}$ (up to standard constant-factor variants). The identification/sparsity argument below only needs \eqref{eq:linear_rate} and strong convexity. \end{remark} \begin{lemma}[From function gap to $y_k$-distance]\label{lem:yk_distance} Assume $F$ is $\mu$-strongly convex and Assumption~\ref{ass:linear_rate} holds. Then for all $k\ge 1$, \begin{equation}\label{eq:yk_dist_bound} \|y_k-x^\star\|_2^2 \le \frac{4\Delta_0}{\mu}\Bigl( (1+\beta)^2 q^k + \beta^2 q^{k-1}\Bigr) \le \underbrace{\frac{4\Delta_0}{\mu}\Bigl((1+\beta)^2 q + \beta^2\Bigr)}_{=:M}\, q^{k-1}. \end{equation} Also, $\|y_0-x^\star\|_2^2=\|x_0-x^\star\|_2^2 \le \frac{2\Delta_0}{\mu}$. \end{lemma} \begin{proof} Strong convexity gives $F(x)-F(x^\star)\ge \frac{\mu}{2}\|x-x^\star\|_2^2$, hence $\|x_k-x^\star\|_2^2 \le \frac{2}{\mu}(F(x_k)-F(x^\star)) \le \frac{2\Delta_0}{\mu}q^k$. For $k\ge 1$, write $y_k-x^\star=(1+\beta)(x_k-x^\star)-\beta(x_{k-1}-x^\star)$ and use $\|a-b\|_2^2 \le 2\|a\|_2^2 + 2\|b\|_2^2$ to get \[ \|y_k-x^\star\|_2^2 \le 2(1+\beta)^2\|x_k-x^\star\|_2^2 + 2\beta^2\|x_{k-1}-x^\star\|_2^2. \] Substitute the bounds on $\|x_k-x^\star\|_2^2$ and $\|x_{k-1}-x^\star\|_2^2$. The second inequality in \eqref{eq:yk_dist_bound} uses $q^k\le q^{k-1}$ for $k\ge 1$. For $k=0$, $y_0=x_0$ and the strong convexity bound gives the claim. \end{proof} \section{Transient identification and the sum of supports} The margin implies that once $y_k$ is sufficiently close to $x^\star$, \emph{no} inactive coordinate can be activated. \begin{lemma}[Stability radius for inactivity]\label{lem:stability} Assume Assumption~\ref{ass:margin}. Define \begin{equation}\label{eq:r_def} r := \frac{t\gamma}{1+tL}. \end{equation} If $\|y-x^\star\|_2\le r$, then $A(y)=\emptyset$ (i.e.\ $\supp(\prox_{tg}(u(y)))\subseteq S^\star$). In particular, if $\|y_k-x^\star\|_2\le r$ then $A_k=\emptyset$. \end{lemma} \begin{proof} If $\|y-x^\star\|_2\le r$, then by Lemma~\ref{lem:uLip}, $\|u(y)-u(x^\star)\|_2\le (1+tL)\|y-x^\star\|_2\le t\gamma$. Then \eqref{eq:counting} forces $|A(y)|\le 1$ but in fact the strict inequality in Lemma~\ref{lem:tgammastep} shows no coordinate can satisfy $|u(y)_i-u(x^\star)_i|>t\gamma$; hence $A(y)=\emptyset$. \end{proof} Define the \emph{(margin-based) identification time} \begin{equation}\label{eq:Kdef} K_\gamma := \min\{k\ge 0:\ \|y_k-x^\star\|_2\le r\}. \end{equation} Then by Lemma~\ref{lem:stability}, $A_k=\emptyset$ for all $k\ge K_\gamma$. We call iterations $k=0,1,\dots,K_\gamma-1$ the \emph{transient phase}. \begin{lemma}[Explicit bound on $K_\gamma$]\label{lem:Kbound} Assume Assumption~\ref{ass:linear_rate} and let $M$ be as in Lemma~\ref{lem:yk_distance}. Then \[ K_\gamma \le 1 + \left\lceil \frac{\log(M/r^2)}{\log(1/q)} \right\rceil. \] \end{lemma} \begin{proof} For $k\ge 1$, Lemma~\ref{lem:yk_distance} gives $\|y_k-x^\star\|_2^2\le M q^{k-1}$. It suffices that $M q^{k-1}\le r^2$, i.e.\ $k-1\ge \log(M/r^2)/\log(1/q)$. \end{proof} \subsection{Main theorem: a bound on $\sum_{k=0}^{K_\gamma-1}|\supp(x_{k+1})|$} \begin{theorem}[Sum of supports over the transient]\label{thm:sum_support_transient} Assume Assumptions~\ref{ass:margin} and \ref{ass:linear_rate}, and run \eqref{eq:apg}--\eqref{eq:init}. Let $K_\gamma$ be defined by \eqref{eq:Kdef}. Then \begin{equation}\label{eq:sum_support_main} \sum_{k=0}^{K_\gamma-1} |\supp(x_{k+1})| \;\le\; |S^\star|\,K_\gamma \;+\; \left(\frac{1+tL}{t\gamma}\right)^2 \left( \frac{2\Delta_0}{\mu} +\frac{4\Delta_0}{\mu}\cdot \frac{(1+\beta)^2 q + \beta^2}{1-q} \right). \end{equation} Moreover, $K_\gamma$ satisfies Lemma~\ref{lem:Kbound}. In particular, if one uses the standard choices $t=1/L$ and $q\approx 1-\sqrt{\mu/L}$, then the \emph{spurious part} of the sum in \eqref{eq:sum_support_main} is \[ O\!\left(\frac{\Delta_0\,L^{5/2}}{\gamma^2\,\mu^{3/2}}\right), \] and $K_\gamma = O\!\left(\sqrt{\tfrac{L}{\mu}}\log \tfrac{1}{\gamma}\right)$ (up to constants hidden in $M$). \end{theorem} \begin{proof} For each $k$, $\supp(x_{k+1}) = (\supp(x_{k+1})\cap S^\star)\cup A_k$, hence $|\supp(x_{k+1})|\le |S^\star| + |A_k|$. Summing over $k=0,\dots,K_\gamma-1$ yields \[ \sum_{k=0}^{K_\gamma-1} |\supp(x_{k+1})| \le |S^\star|K_\gamma + \sum_{k=0}^{K_\gamma-1}|A_k| \le |S^\star|K_\gamma + \sum_{k=0}^{\infty}|A_k|. \] Now use \eqref{eq:Ak_bound_general} with Lemma~\ref{lem:yk_distance}: \[ |A_0|\le \left(\frac{1+tL}{t\gamma}\right)^2 \|y_0-x^\star\|_2^2 \le \left(\frac{1+tL}{t\gamma}\right)^2 \frac{2\Delta_0}{\mu}, \] and for $k\ge 1$, \[ |A_k| \le \left(\frac{1+tL}{t\gamma}\right)^2 \|y_k-x^\star\|_2^2 \le \left(\frac{1+tL}{t\gamma}\right)^2 \frac{4\Delta_0}{\mu}\Bigl( (1+\beta)^2 q^k + \beta^2 q^{k-1}\Bigr). \] Sum the geometric series: $\sum_{k=1}^\infty q^k = \frac{q}{1-q}$ and $\sum_{k=1}^\infty q^{k-1}=\frac{1}{1-q}$. This gives \eqref{eq:sum_support_main}. The asymptotic statements follow by inserting $t=1/L$ so $(1+tL)/(t\gamma)=2L/\gamma$ and $1/(1-q)\asymp \sqrt{L/\mu}$ when $q\approx 1-\sqrt{\mu/L}$. \end{proof} \begin{remark}[Interpretation] Theorem~\ref{thm:sum_support_transient} bounds the \emph{total amount of sparsity work} accumulated before strict complementarity (margin) forces all inactive coordinates to stay zero. The bound splits cleanly into: (i) a `<code>true-support'' term $|S^\star|K_\gamma$, and (ii) a purely margin-controlled </code>`spurious'' term scaling like $\gamma^{-2}$ times a condition-number factor. \end{remark} \section{Specialization: $\ell_1$-regularized PageRank} We record the standard constants for the $\ell_1$-regularized PageRank objective (COLT'22 open problem): \[ F(x)= \frac12 x^\top Qx - \alpha x^\top D^{-1/2}s \;+\; \rho\alpha \|D^{1/2}x\|_1, \] with $Q=\alpha I + \frac{1-\alpha}{2}L_{\rm sym}$ and $L_{\rm sym}$ the normalized Laplacian. For undirected graphs, $\lambda(L_{\rm sym})\subseteq[0,2]$, hence $\lambda(Q)\subseteq[\alpha,1]$. Therefore $\nabla f$ is $L$-Lipschitz with $L\le 1$, and $F$ is $\mu$-strongly convex with $\mu=\alpha$. Also $\lambda_i = \rho\alpha\sqrt{d_i}$ in \eqref{eq:prob}. \begin{corollary}[PageRank constants; explicit $\Delta_0$ bound from $x_0=0$]\label{cor:ppr} Assume the seed is a single node $v$ so $s=e_v$. With initialization $x_0=0$ we have $F(0)=0$ and \[ \Delta_0 = F(0)-F(x^\star) = -F(x^\star) \le \frac{\alpha}{2d_v}\le \frac{\alpha}{2}. \] If we use the standard step $t=1$ (since $L\le 1$) then $r=\gamma/2$ and $(1+tL)/(t\gamma)\le 2/\gamma$. Consequently, the spurious part of \eqref{eq:sum_support_main} is \[ O\!\left(\frac{1}{\gamma^2\sqrt{\alpha}}\right) \quad\text{(up to standard constant factors in the accelerated rate and the momentum choice).} \] Moreover, $K_\gamma = O\!\left(\frac{1}{\sqrt{\alpha}}\log\frac{1}{\gamma}\right)$. \end{corollary} \begin{proof} Since $g(x)\ge 0$, we have $F(x)\ge f(x)$ and thus $F(x^\star)\ge \min_x f(x)$. For $f(x)=\frac12 x^\top Qx - b^\top x$ with $b=\alpha D^{-1/2}s$, the unconstrained minimizer is $x_f^\star=Q^{-1}b$ and $\min f = -\frac12 b^\top Q^{-1}b$. Because $Q\succeq \alpha I$, we have $Q^{-1}\preceq \frac{1}{\alpha}I$ and hence $b^\top Q^{-1}b \le \frac{1}{\alpha}\|b\|_2^2 = \frac{1}{\alpha}\alpha^2\|D^{-1/2}s\|_2^2 =\alpha\|D^{-1/2}s\|_2^2$. For $s=e_v$, $\|D^{-1/2}s\|_2^2 = 1/d_v$, giving $\min f \ge -\alpha/(2d_v)$, hence $-F(x^\star)\le -\min f \le \alpha/(2d_v)$. The remaining statements follow by inserting $L\le 1$, $\mu=\alpha$, $t=1$ into Theorem~\ref{thm:sum_support_transient} and Lemma~\ref{lem:Kbound}. \end{proof} \begin{remark}[What this bound does \emph{not} give] The bound is explicit and dimension-free, but it depends on the margin $\gamma$. To recover a purely $(\alpha,\rho)$-based locality bound (e.g.\ $\widetilde O((\sqrt{\alpha}\rho)^{-1})$) one would need an additional argument lower-bounding $\gamma$ in terms of $\rho$ for the instance class of interest. \end{remark} \end{document} </syntaxhighlight> === User: That's excellent!! === That's excellent!! Can you now bound the total running of the algorithm? You can use the fact that |S^*| <= 1/ρ. Don't update the whole latex, just give me the new material to append. === Assistant: Append the following material it converts the support-sum bound into a total running-time / total coordinate-work bound… === Append the following material (it converts the support-sum bound into a total running-time / total coordinate-work bound, using ∣S⋆∣≤1/ρ|S^\star|\le 1/\rho∣S⋆∣≤1/ρ as in the local-PPR setting). open-problem-fountoulakis22a <syntaxhighlight lang="latex">% ============================================================ \section{Total running time via the sum of supports} We now convert the support-sum bounds into a total work bound for an $\varepsilon$-accurate solution. Throughout, we measure work in a standard \emph{support-sensitive} model: the cost of iteration $k$ is proportional to the number of (stored/updated) nonzeros in the vectors that appear in the update. This is the same model implicitly used when one argues that a method is ``local'' because it only touches $O(1/\rho)$ coordinates per iteration. \subsection{A simple support-sensitive work model} \begin{assumption}[Support-sensitive iteration cost]\label{ass:work_model} There exist absolute constants $c_1,c_2>0$ such that iteration $k$ of \eqref{eq:apg} can be implemented with work \[ \mathrm{work}(k) \;\le\; c_1\,|\supp(y_k)| \;+\; c_2\,|\supp(x_{k+1})|. \] (Any $O(1)$ overhead per iteration can be absorbed into the constants.) \end{assumption} \begin{lemma}[Support of the extrapolated point]\label{lem:supp_y} For all $k\ge 1$, \[ \supp(y_k)\subseteq \supp(x_k)\cup \supp(x_{k-1}), \qquad\text{hence}\qquad |\supp(y_k)| \le |\supp(x_k)|+|\supp(x_{k-1})|. \] \end{lemma} \begin{proof} Coordinatewise, $y_{k,i}=(1+\beta)x_{k,i}-\beta x_{k-1,i}$. Thus if $x_{k,i}=x_{k-1,i}=0$ then $y_{k,i}=0$. \end{proof} \subsection{Total work up to a fixed iteration budget} Define the cumulative work up to $N$ iterations as \[ \mathrm{Work}(N) := \sum_{k=0}^{N-1}\mathrm{work}(k). \] Let \begin{equation}\label{eq:Cspur_def} C_{\rm spur}:=\left(\frac{1+tL}{t\gamma}\right)^2 \left( \frac{2\Delta_0}{\mu} +\frac{4\Delta_0}{\mu}\cdot \frac{(1+\beta)^2 q + \beta^2}{1-q} \right), \end{equation} which is exactly the constant appearing in Theorem~\ref{thm:sum_support_transient}. \begin{corollary}[Work bound up to $N$ iterations]\label{cor:workN} Assume Assumptions~\ref{ass:margin},~\ref{ass:linear_rate}, and~\ref{ass:work_model}. Then for every integer $N\ge 1$, \begin{align} \sum_{k=0}^{N-1} |\supp(x_{k+1})| &\le N|S^\star| + C_{\rm spur},\label{eq:supp_sum_xN}\\ \sum_{k=0}^{N-1} |\supp(y_k)| &\le 2N|S^\star| + 2C_{\rm spur},\label{eq:supp_sum_yN} \end{align} and therefore \begin{equation}\label{eq:workN_bound} \mathrm{Work}(N) \;\le\; O\!\left(N|S^\star| + C_{\rm spur}\right), \end{equation} where the hidden constant depends only on $(c_1,c_2)$. If, moreover, $|S^\star|\le 1/\rho$, then \begin{equation}\label{eq:workN_rho} \mathrm{Work}(N) \;\le\; O\!\left(\frac{N}{\rho} + C_{\rm spur}\right). \end{equation} \end{corollary} \begin{proof} For each $k$ we have $\supp(x_{k+1})\subseteq S^\star \cup A_k$, hence $|\supp(x_{k+1})|\le |S^\star|+|A_k|$. Summing over $k=0,\dots,N-1$ yields $\sum_{k=0}^{N-1} |\supp(x_{k+1})|\le N|S^\star| + \sum_{k=0}^{N-1}|A_k|$. The proof of Theorem~\ref{thm:sum_support_transient} bounds the entire geometric series $\sum_{k=0}^{\infty}|A_k|\le C_{\rm spur}$, which implies \eqref{eq:supp_sum_xN}. Next, Lemma~\ref{lem:supp_y} gives for $k\ge 1$: $|\supp(y_k)|\le |\supp(x_k)|+|\supp(x_{k-1})|$. Summing this for $k=1,\dots,N-1$ and using $y_0=x_0$ yields \eqref{eq:supp_sum_yN}. Finally, combine \eqref{eq:supp_sum_xN}--\eqref{eq:supp_sum_yN} with Assumption~\ref{ass:work_model}. \end{proof} \subsection{Total work to reach $\varepsilon$ accuracy} Assumption~\ref{ass:linear_rate} implies that $F(x_N)-F(x^\star)\le \varepsilon$ once \begin{equation}\label{eq:Neps_def} N \;\ge\; N_\varepsilon := \left\lceil \frac{\log(\Delta_0/\varepsilon)}{\log(1/q)} \right\rceil. \end{equation} \begin{corollary}[Total running time to $\varepsilon$ accuracy]\label{cor:workeps} Assume $|S^\star|\le 1/\rho$ and the hypotheses of Corollary~\ref{cor:workN}. Then the total support-sensitive work to compute an iterate $x_N$ with $F(x_N)-F(x^\star)\le \varepsilon$ is \begin{equation}\label{eq:workeps_general} \mathrm{Work}(N_\varepsilon) \;\le\; O\!\left( \frac{1}{\rho}\cdot\frac{\log(\Delta_0/\varepsilon)}{\log(1/q)} \;+\; C_{\rm spur} \right). \end{equation} With the standard accelerated choice $q\approx 1-\sqrt{\mu/L}$ and step $t=1/L$, this becomes \begin{equation}\label{eq:workeps_simplified} \mathrm{Work}(N_\varepsilon) = O\!\left( \frac{1}{\rho}\sqrt{\frac{L}{\mu}}\,\log\frac{\Delta_0}{\varepsilon} \;+\; \frac{\Delta_0\,L^{5/2}}{\gamma^2\,\mu^{3/2}} \right). \end{equation} In the $\ell_1$-regularized PageRank setting (undirected graph), $L\le 1$ and $\mu=\alpha$. If additionally $\Delta_0\le \alpha/2$ (single-seed case) and we take $t=1$, then \begin{equation}\label{eq:workeps_ppr} \mathrm{Work}(N_\varepsilon) = O\!\left( \frac{1}{\rho\sqrt{\alpha}}\log\frac{1}{\varepsilon} \;+\; \frac{1}{\gamma^2\sqrt{\alpha}} \right), \end{equation} up to the standard constant-factor variants in the accelerated rate. \end{corollary} % ============================================================ </syntaxhighlight> === User: Excellent! === Excellent! Can you show me case where a combination of parameters of the algorithm and a particular bad graph example can result to γ < sqrt{ρ} ? don't produce latex yet. Show me your answer here. === Assistant: Yes — you can make the margin γ\gammaγ arbitrarily small even 0 for an ℓ1\ell1ℓ1-regularized PageRank instance by choo… === Yes — you can make the margin γ\gammaγ arbitrarily small (even 0) for an ℓ1\ell_1ℓ1-regularized PageRank instance by choosing (α,ρ)(\alpha,\rho)(α,ρ) and a “bad” graph so that you sit right at (or extremely near) a support-change breakpoint of the regularization path. This makes ∣∇f(x⋆)i∣|\nabla f(x^\star)_i|∣∇f(x⋆)i∣ almost equal to the threshold λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi=ραdi for some inactive coordinate, hence γ=λi−∣∇f(x⋆)i∣\gamma=\lambda_i-|\nabla f(x^\star)_i|γ=λi−∣∇f(x⋆)i∣ becomes tiny. (This is exactly what strict complementarity rules out.) Below is a fully explicit, connected graph example where you can compute γ\gammaγ in closed form and force γ<ρ\gamma < \sqrt{\rho}γ<ρ by a wide margin. ==== Take a star graph with: ==== * one center node ccc connected to mmm leaf nodes ℓ1,…,ℓm\ell_1,\dots,\ell_mℓ1,…,ℓm, * seed s=ecs=e_cs=ec, * teleportation α∈(0,1)\alpha\in(0,1)α∈(0,1), * regularization parameter ρ>0\rho>0ρ>0. Degrees: * dc=md_c=mdc=m, * dℓ=1d_{\ell}=1dℓ=1 for each leaf. For the ℓ1\ell_1ℓ1-regularized PPR objective (your equation (3)), the weighted ℓ1\ell_1ℓ1 penalty is * λc=ραm\lambda_c = \rho\alpha\sqrt{m}λc=ραm at the center, * λℓ=ρα\lambda_{\ell} = \rho\alphaλℓ=ρα at each leaf. open-problem-fountoulakis22a Because of symmetry, in a certain parameter regime the optimizer has the form: * xc⋆>0x_c^\star > 0xc⋆>0, * xℓ⋆=0x_{\ell}^\star = 0xℓ⋆=0 for every leaf. ===== In this regime (all leaves are zero), the center coordinate satisfies ===== (Qx⋆−αD−1/2s)c+λc=0,(Qx^\star - \alpha D^{-1/2}s)_c + \lambda_c = 0,(Qx⋆−αD−1/2s)c+λc=0, which gives xc⋆ = 2α(1−ρm)(1+α)m.x_c^\star \;=\; \frac{2\alpha(1-\rho m)}{(1+\alpha)\sqrt{m}}.xc⋆=(1+α)m2α(1−ρm). This is positive as long as ρ<1/m\rho < 1/mρ<1/m. ===== A leaf ℓ\ellℓ stays at zero iff its KKT inequality holds: ===== ∣(Qx⋆−αD−1/2s)ℓ∣≤λℓ=ρα.|(Qx^\star - \alpha D^{-1/2}s)_\ell| \le \lambda_\ell = \rho\alpha.∣(Qx⋆−αD−1/2s)ℓ∣≤λℓ=ρα. For the star, this inequality reduces to a very clean threshold: ρ ≥ ρ0:=1−α2m.\rho \;\ge\; \rho_0 := \frac{1-\alpha}{2m}.ρ≥ρ0:=2m1−α. So for ρ∈[1−α2m, 1m) \boxed{\;\rho \in \Big[\frac{1-\alpha}{2m},\;\frac{1}{m}\Big)\;}ρ∈[2m1−α,m1) the unique optimum has support S⋆={c}S^\star=\{c\}S⋆={c} (so ∣S⋆∣=1≤1/ρ|S^\star|=1\le 1/\rho∣S⋆∣=1≤1/ρ automatically). ===== All inactive coordinates are leaves and are symmetric, so the margin is attained on any leaf: ===== γ=λℓ−∣(Qx⋆−αD−1/2s)ℓ∣.\gamma = \lambda_\ell - |(Qx^\star - \alpha D^{-1/2}s)_\ell|.γ=λℓ−∣(Qx⋆−αD−1/2s)ℓ∣. A direct substitution yields the closed form: γ = α1+α(2ρ−1−αm) = 2α1+α(ρ−ρ0). \boxed{\; \gamma \;=\; \frac{\alpha}{1+\alpha}\Big(2\rho - \frac{1-\alpha}{m}\Big) \;=\; \frac{2\alpha}{1+\alpha}\big(\rho-\rho_0\big). \;}γ=1+αα(2ρ−m1−α)=1+α2α(ρ−ρ0). Two key takeaways: * At the breakpoint ρ=ρ0\rho=\rho_0ρ=ρ0: γ=0⇒γ<ρ (trivially).\gamma = 0 \quad\Rightarrow\quad \gamma < \sqrt{\rho}\ \text{(trivially)}.γ=0⇒γ<ρ (trivially). * If you choose ρ=ρ0+δ\rho=\rho_0+\deltaρ=ρ0+δ with δ>0\delta>0δ>0 tiny, then γ=2α1+αδ,\gamma = \frac{2\alpha}{1+\alpha}\delta,γ=1+α2αδ, so you can make γ\gammaγ as small as you want, hence certainly <ρ<\sqrt{\rho}<ρ. ==== Pick typical PageRank teleportation, say: ==== * α=0.15\alpha = 0.15α=0.15, * star size m=100m = 100m=100. Then ρ0=1−α2m=0.85200=0.00425.\rho_0 = \frac{1-\alpha}{2m}=\frac{0.85}{200}=0.00425.ρ0=2m1−α=2000.85=0.00425. Take ρ=0.004251\rho = 0.004251ρ=0.004251 (so δ=10−6\delta = 10^{-6}δ=10−6). Then γ=2α1+αδ=0.31.15⋅10−6≈2.61×10−7.\gamma = \frac{2\alpha}{1+\alpha}\delta = \frac{0.3}{1.15}\cdot 10^{-6} \approx 2.61\times 10^{-7}.γ=1+α2αδ=1.150.3⋅10−6≈2.61×10−7. But ρ≈0.004251≈6.52×10−2.\sqrt{\rho} \approx \sqrt{0.004251} \approx 6.52\times 10^{-2}.ρ≈0.004251≈6.52×10−2. So γ≈2.6⋅10−7 ≪ 6.5⋅10−2 = ρ.\gamma \approx 2.6\cdot 10^{-7} \;\ll\; 6.5\cdot 10^{-2} \;=\; \sqrt{\rho}.γ≈2.6⋅10−7≪6.5⋅10−2=ρ. This is a clean “bad” case: connected graph, ∣S⋆∣=1≤1/ρ|S^\star|=1 \le 1/\rho∣S⋆∣=1≤1/ρ, yet γ\gammaγ is orders of magnitude smaller than ρ\sqrt{\rho}ρ because ρ\rhoρ is chosen near a support-change breakpoint. ===== Step size ttt, momentum β\betaβ, etc. do not change γ\gammaγ: γ\gammaγ is a property of the problem instance (graph + seed + α,ρ\alpha,\rhoα,ρ). What the algorithm parameters can do is affect how long you linger in the transient, but they don’t alter the KKT slack at the true minimizer. ===== If you want, I can also give a path graph construction where ρ0\rho_0ρ0 is small even with moderate mmm, but the star example is the simplest explicit one where you can compute γ(ρ)\gamma(\rho)γ(ρ) in closed form. === User: Sure, give me a path graph construction too. === Sure, give me a path graph construction too. === Assistant: Here’s a path graph version of the same “bad margin” phenomenon. === Here’s a path graph version of the same “bad margin” phenomenon. It’s actually even cleaner than you might expect: you can make γ\gammaγ arbitrarily small by tuning ρ\rhoρ to sit extremely close to the point where the next node on the path is about to enter the support. I’ll stick to the ℓ1\ell_1ℓ1-regularized PPR objective (your eq. (3)) and the usual Q=αI+1−α2LQ=\alpha I+\frac{1-\alpha}{2}LQ=αI+21−αL setup. open-problem-fountoulakis22a ==== Let Pm+1P_{m+1}Pm+1 be the path graph on nodes 1,2,…,m+11,2,\dots,m+11,2,…,m+1 with edges (i,i+1)(i,i+1)(i,i+1). ==== Take the seed s=e1s=e_1s=e1 at endpoint node 111. Degrees: * d1=1d_1=1d1=1, * di=2d_i=2di=2 for 2≤i≤m2\le i\le m2≤i≤m, * dm+1=1d_{m+1}=1dm+1=1. Regularizer weights in (3) are λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi=ραdi, so * λ1=ρα\lambda_1=\rho\alphaλ1=ρα, * λ2=ρα2\lambda_2=\rho\alpha\sqrt{2}λ2=ρα2, * and for interior nodes similarly ρα2\rho\alpha\sqrt{2}ρα2. Also, for an undirected graph you can write Q = 1+α2I − 1−α2D−1/2AD−1/2,Q \;=\; \frac{1+\alpha}{2}I \;-\; \frac{1-\alpha}{2}D^{-1/2}AD^{-1/2},Q=21+αI−21−αD−1/2AD−1/2, so the only off-diagonals are on edges: Qij=−1−α2⋅1didjif (i,j)∈E.Q_{ij} = -\frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i d_j}} \quad \text{if } (i,j)\in E.Qij=−21−α⋅didj1if (i,j)∈E. In particular, between nodes 111 and 222, Q21=−1−α2⋅11⋅2=−1−α22.Q_{21} = -\frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{1\cdot 2}} = -\frac{1-\alpha}{2\sqrt{2}}.Q21=−21−α⋅1⋅21=−221−α. ==== Consider the candidate solution ==== x⋆=(x1⋆,0,0,…,0),x1⋆>0.x^\star = (x_1^\star, 0,0,\dots,0), \quad x_1^\star>0.x⋆=(x1⋆,0,0,…,0),x1⋆>0. ===== The smooth gradient is ∇f(x)=Qx−αD−1/2s\nabla f(x)=Qx-\alpha D^{-1/2}s∇f(x)=Qx−αD−1/2s, and since d1=1d_1=1d1=1 we have D−1/2s=e1D^{-1/2}s=e_1D−1/2s=e1. ===== So the KKT condition at coordinate 1 (active, positive) is: (Qx⋆)1−α+λ1=0.(Qx^\star)_1 - \alpha + \lambda_1 = 0.(Qx⋆)1−α+λ1=0. But (Qx⋆)1=Q11x1⋆=1+α2x1⋆(Qx^\star)_1 = Q_{11}x_1^\star = \frac{1+\alpha}{2}x_1^\star(Qx⋆)1=Q11x1⋆=21+αx1⋆. Hence 1+α2x1⋆−α+ρα=0⟹x1⋆=2α(1−ρ)1+α.\frac{1+\alpha}{2}x_1^\star - \alpha + \rho\alpha = 0 \quad\Longrightarrow\quad x_1^\star = \frac{2\alpha(1-\rho)}{1+\alpha}.21+αx1⋆−α+ρα=0⟹x1⋆=1+α2α(1−ρ). So this is positive as long as ρ<1\rho<1ρ<1. ===== Node 2 is inactive (x2⋆=0x_2^\star=0x2⋆=0) iff its KKT inequality holds: ===== ∣∇f(x⋆)2∣≤λ2.|\nabla f(x^\star)_2| \le \lambda_2.∣∇f(x⋆)2∣≤λ2. Now ∇f(x⋆)2=(Qx⋆)2\nabla f(x^\star)_2 = (Qx^\star)_2∇f(x⋆)2=(Qx⋆)2 (no seed term at node 2), and since only x1⋆x_1^\starx1⋆ is nonzero, (Qx⋆)2=Q21x1⋆=−1−α22x1⋆.(Qx^\star)_2 = Q_{21}x_1^\star = -\frac{1-\alpha}{2\sqrt{2}}x_1^\star.(Qx⋆)2=Q21x1⋆=−221−αx1⋆. So ∣∇f(x⋆)2∣=1−α22x1⋆=1−α22⋅2α(1−ρ)1+α=α(1−α)(1−ρ)(1+α)2.|\nabla f(x^\star)_2| = \frac{1-\alpha}{2\sqrt{2}}x_1^\star = \frac{1-\alpha}{2\sqrt{2}}\cdot \frac{2\alpha(1-\rho)}{1+\alpha} = \frac{\alpha(1-\alpha)(1-\rho)}{(1+\alpha)\sqrt{2}}.∣∇f(x⋆)2∣=221−αx1⋆=221−α⋅1+α2α(1−ρ)=(1+α)2α(1−α)(1−ρ). The threshold is λ2=ρα2\lambda_2=\rho\alpha\sqrt{2}λ2=ρα2. The inequality becomes: α(1−α)(1−ρ)(1+α)2≤ρα2⟺(1−α)(1−ρ)1+α≤2ρ.\frac{\alpha(1-\alpha)(1-\rho)}{(1+\alpha)\sqrt{2}} \le \rho\alpha\sqrt{2} \quad\Longleftrightarrow\quad \frac{(1-\alpha)(1-\rho)}{1+\alpha}\le 2\rho.(1+α)2α(1−α)(1−ρ)≤ρα2⟺1+α(1−α)(1−ρ)≤2ρ. Solving yields the breakpoint: ρ0=1−α3+α \boxed{\;\rho_0 = \frac{1-\alpha}{3+\alpha}\;}ρ0=3+α1−α and for any ρ∈[ρ0,1)\rho\in[\rho_0,1)ρ∈[ρ0,1), the solution with support {1}\{1\}{1} satisfies all KKT conditions. (Every node i≥3i\ge 3i≥3 has (Qx⋆)i=0(Qx^\star)_i=0(Qx⋆)i=0 when only x1⋆≠0x_1^\star\neq 0x1⋆=0, so their inequalities are trivially satisfied.) So for all path lengths mmm, you get: S⋆={1},∣S⋆∣=1≤1/ρ,whenever ρ∈[ρ0,1).S^\star = \{1\},\quad |S^\star|=1\le 1/\rho, \quad \text{whenever }\rho\in[\rho_0,1).S⋆={1},∣S⋆∣=1≤1/ρ,whenever ρ∈[ρ0,1). ==== The margin is ==== γ=mini∈I⋆(λi−∣∇f(x⋆)i∣).\gamma = \min_{i\in I^\star}(\lambda_i - |\nabla f(x^\star)_i|).γ=i∈I⋆min(λi−∣∇f(x⋆)i∣). Here the tightest inactive coordinate is node 2 (the only one with nonzero gradient), so γ=λ2−∣∇f(x⋆)2∣=ρα2−α(1−α)(1−ρ)(1+α)2.\gamma = \lambda_2 - |\nabla f(x^\star)_2| = \rho\alpha\sqrt{2} - \frac{\alpha(1-\alpha)(1-\rho)}{(1+\alpha)\sqrt{2}}.γ=λ2−∣∇f(x⋆)2∣=ρα2−(1+α)2α(1−α)(1−ρ). This simplifies neatly to a linear function of ρ−ρ0\rho-\rho_0ρ−ρ0: γ=α(3+α)(1+α)2 (ρ−ρ0). \boxed{\; \gamma = \frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}\;(\rho-\rho_0). \;}γ=(1+α)2α(3+α)(ρ−ρ0). So: * At ρ=ρ0\rho=\rho_0ρ=ρ0, γ=0\gamma=0γ=0. * For ρ=ρ0+δ\rho=\rho_0+\deltaρ=ρ0+δ with tiny δ>0\delta>0δ>0, you get γ=Cδ\gamma = C\deltaγ=Cδ with C=α(3+α)(1+α)2C=\frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}C=(1+α)2α(3+α). Since ρ\sqrt{\rho}ρ is Θ(1)\Theta(1)Θ(1) here (because ρ0\rho_0ρ0 is a constant depending on α\alphaα), taking δ\deltaδ extremely small makes γ\gammaγ much smaller than ρ\sqrt{\rho}ρ. ===== Take α=0.15\alpha=0.15α=0.15. Then ===== ρ0=0.853.15≈0.26984,ρ0≈0.519.\rho_0 = \frac{0.85}{3.15}\approx 0.26984, \qquad \sqrt{\rho_0}\approx 0.519.ρ0=3.150.85≈0.26984,ρ0≈0.519. Now choose ρ=ρ0+10−6\rho=\rho_0+10^{-6}ρ=ρ0+10−6. Then γ≈0.15⋅3.151.15⋅1.414⋅10−6≈2.9×10−7≪0.519≈ρ.\gamma \approx \frac{0.15\cdot 3.15}{1.15\cdot 1.414}\cdot 10^{-6} \approx 2.9\times 10^{-7} \ll 0.519 \approx \sqrt{\rho}.γ≈1.15⋅1.4140.15⋅3.15⋅10−6≈2.9×10−7≪0.519≈ρ. So this path graph gives you a very explicit, closed-form “bad” case where γ<ρ\gamma < \sqrt{\rho}γ<ρ by an arbitrarily large margin, while the true support is tiny (∣S⋆∣=1≤1/ρ|S^\star|=1\le 1/\rho∣S⋆∣=1≤1/ρ). ===== Algorithm parameters (t,βt,\betat,β, restart, etc.) don’t change γ\gammaγ: γ\gammaγ is a property of the instance (graph + α,ρ,s\alpha,\rho,sα,ρ,s) at the true optimizer. What algorithm parameters affect is how much transient “support thrashing” you might see before you enter the γ\gammaγ-neighborhood. ===== If you want, I can also give a path-prefix support construction (where S⋆={1,2,…,k}S^\star=\{1,2,\dots,k\}S⋆={1,2,…,k}) and show the breakpoint for node k+1k+1k+1, so you can make γ\gammaγ tiny while ∣S⋆∣≈k≈1/ρ|S^\star|\approx k\approx 1/\rho∣S⋆∣≈k≈1/ρ. === User: Convert my last two prompts and your answer into latex so I can append it to my current latex. === Convert my last two prompts and your answer into latex so I can append it to my current latex. === Assistant: Here is appendable LaTeX that turns the last two prompts + my two answers star + path constructions, and the “γ<ρ\gamma… === Here is appendable LaTeX that turns the last two prompts + my two answers (star + path constructions, and the “γ<ρ\gamma<\sqrt{\rho}γ<ρ” point) into a clean section you can paste into your current note (it refers to the ℓ1\ell_1ℓ1-regularized PageRank objective (3) in the COLT’22 open-problem note open-problem-fountoulakis22a ). <syntaxhighlight lang="latex">% ============================================================ \section{Bad instances where the margin $\gamma$ can be much smaller than $\sqrt{\rho}$} We now record two explicit graph families showing that the strict-complementarity margin \[ \gamma := \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr), \qquad \lambda_i=\rho\alpha\sqrt{d_i}, \] can be made \emph{arbitrarily small} (even $\gamma=0$) by choosing $\rho$ near a breakpoint of the regularization path where an inactive KKT inequality becomes tight. In particular, these instances satisfy $\gamma<\sqrt{\rho}$ for suitable choices of $\rho$. \begin{remark}[Algorithmic parameters vs.\ $\gamma$] The margin $\gamma$ is a property of the \emph{instance} (graph, seed $s$, and PageRank parameters $\alpha,\rho$), since it is defined from the KKT slack at the \emph{true minimizer} $x^\star$. Step size and momentum parameters of an accelerated method do \emph{not} change $\gamma$; they only affect the transient behavior before the iterates enter a $\gamma$-neighborhood of $x^\star$. \end{remark} \subsection{Star graph (seed at the center)} Let $G$ be a star on $m+1$ nodes with center node $c$ of degree $d_c=m$ and leaves $\ell$ of degree $1$. Let the seed be $s=e_c$. Consider candidates of the form $x=x_c e_c$. \begin{lemma}[Star graph breakpoint: $\gamma$ can be $0$]\label{lem:star_gamma_small} Fix $\alpha\in(0,1]$ and $m\ge 1$ and define \[ \rho_0 := \frac{1-\alpha}{2m}. \] For any $\rho\in[\rho_0,\,1/m)$, the unique minimizer has support $S^\star=\{c\}$, with \[ x_c^\star = \frac{2\alpha(1-\rho m)}{(1+\alpha)\sqrt{m}}. \] Moreover, the KKT slack on any leaf $\ell$ equals \[ \lambda_\ell - |\nabla f(x^\star)_\ell| = \frac{2\alpha}{1+\alpha}\,(\rho-\rho_0), \] and thus at the breakpoint $\rho=\rho_0$ one has $\gamma=0<\sqrt{\rho_0}$. \end{lemma} \begin{proof} For undirected graphs, \[ Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}, \qquad \nabla f(x)=Qx-\alpha D^{-1/2}s. \] On the star, $Q_{cc}=\frac{1+\alpha}{2}$ and for each leaf $\ell$, $Q_{\ell c}=Q_{c\ell}=-\frac{1-\alpha}{2}\frac{1}{\sqrt{m}}$. Assume $x^\star=x_c^\star e_c$ with $x_c^\star>0$. The active KKT condition at $c$ is \[ 0 = \nabla f(x^\star)_c + \lambda_c = \frac{1+\alpha}{2}x_c^\star - \frac{\alpha}{\sqrt{m}} + \rho\alpha\sqrt{m}, \] which yields the stated formula for $x_c^\star$ (and requires $\rho<1/m$ for positivity). For a leaf $\ell$, the inactive KKT inequality is \[ |\nabla f(x^\star)_\ell| = |(Qx^\star)_\ell| = \frac{1-\alpha}{2}\frac{x_c^\star}{\sqrt{m}} \le \lambda_\ell=\rho\alpha, \] which reduces to $\rho\ge\rho_0$ after substituting $x_c^\star$. Strong convexity of $F$ implies the KKT conditions certify the unique minimizer and its support. Finally, the slack at a leaf is \[ \lambda_\ell-|\nabla f(x^\star)_\ell| = \rho\alpha - \frac{1-\alpha}{2}\frac{x_c^\star}{\sqrt{m}} = \frac{2\alpha}{1+\alpha}(\rho-\rho_0), \] and since all inactive coordinates are leaves, this gives $\gamma$. \end{proof} \begin{remark}[Getting $\gamma<\sqrt{\rho}$ with $\gamma>0$ on the star] Take $\rho=\rho_0+\delta$ with $\delta>0$. Then $\gamma=\frac{2\alpha}{1+\alpha}\delta$ while $\rho=\rho_0+\delta>0$. Choosing $\delta$ sufficiently small gives $\gamma<\sqrt{\rho}$ (and $\gamma$ can be made arbitrarily small). \end{remark} \subsection{Path graph (seed at an endpoint)} Let $G=P_{m+1}$ be the path on nodes $1,2,\dots,m+1$ with edges $(i,i+1)$. Let $s=e_1$ (seed at endpoint $1$). Here $d_1=d_{m+1}=1$ and $d_i=2$ for $2\le i\le m$. Consider candidates of the form $x=x_1 e_1$. \begin{lemma}[Path graph breakpoint: $\gamma$ can be $0$]\label{lem:path_gamma_small} Fix $\alpha\in(0,1]$ and $m\ge 1$ and define \[ \rho_0 := \frac{1-\alpha}{3+\alpha}. \] For any $\rho\in[\rho_0,\,1)$, the unique minimizer has support $S^\star=\{1\}$, with \[ x_1^\star = \frac{2\alpha(1-\rho)}{1+\alpha}. \] Moreover, the KKT slack at node $2$ equals \[ \lambda_2-|\nabla f(x^\star)_2| = \frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}\,(\rho-\rho_0). \] In particular, at the breakpoint $\rho=\rho_0$ one has $\gamma=0<\sqrt{\rho_0}$. \end{lemma} \begin{proof} As above, $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$. Since $(1,2)\in E$ and $(D^{-1/2}AD^{-1/2})_{21}=1/\sqrt{1\cdot 2}=1/\sqrt{2}$, we have $Q_{11}=\frac{1+\alpha}{2}$ and $Q_{21}=-\frac{1-\alpha}{2\sqrt{2}}$. Assume $x^\star=x_1^\star e_1$ with $x_1^\star>0$. The active KKT condition at node $1$ is \[ 0=\nabla f(x^\star)_1+\lambda_1 = \frac{1+\alpha}{2}x_1^\star - \alpha + \rho\alpha, \] which yields $x_1^\star=\frac{2\alpha(1-\rho)}{1+\alpha}$ (positive for $\rho<1$). The inactive KKT inequality at node $2$ is \[ |\nabla f(x^\star)_2| = |(Qx^\star)_2| = \frac{1-\alpha}{2\sqrt{2}}\,x_1^\star \le \lambda_2=\rho\alpha\sqrt{2}, \] which reduces to $\rho\ge\rho_0$ after substituting $x_1^\star$. For nodes $i\ge 3$, $(Qx^\star)_i=0$ because $x^\star$ is supported only on node $1$ and $Q$ is sparse along edges, hence their KKT inequalities hold trivially. Strong convexity implies uniqueness. Finally, \[ \lambda_2-|\nabla f(x^\star)_2| = \rho\alpha\sqrt{2} - \frac{1-\alpha}{2\sqrt{2}}x_1^\star = \frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}(\rho-\rho_0), \] and at $\rho=\rho_0$ this slack is $0$, so $\gamma=0$. \end{proof} \begin{remark}[Getting $\gamma<\sqrt{\rho}$ with $\gamma>0$ on the path] Take $\rho=\rho_0+\delta$ with $\delta>0$. Then the slack at node $2$ is $O(\delta)$, hence $\gamma\le O(\delta)$, while $\sqrt{\rho}\approx \sqrt{\rho_0}>0$. Choosing $\delta$ sufficiently small yields $\gamma<\sqrt{\rho}$ (and $\gamma$ can be made arbitrarily small). \end{remark} % ============================================================ </syntaxhighlight> === User: Excellent! === Excellent! Can you show me case where a particular bad graph example can result to γ < little-o( sqrt{ρ} )? don't produce latex yet. Show me your answer here. === Assistant: Yes. === Yes. You can get γ=o(ρ)\gamma = o(\sqrt{\rho})γ=o(ρ) by taking a family of star graphs where ρ→0\rho\to 0ρ→0 (via the graph size) and choosing ρ\rhoρ extremely close to the support-change breakpoint so that the KKT inequality is almost tight on an inactive node. I’ll spell it out cleanly. ==== Interpreting your statement in standard asymptotic terms: ==== : This is stronger than just “γ<ρ\gamma<\sqrt{\rho}γ<ρ” at a fixed instance. ==== Take the star Sm+1S_{m+1}Sm+1 (center ccc, leaves ℓ1,…,ℓm\ell_1,\dots,\ell_mℓ1,…,ℓm), seed s=ecs=e_cs=ec, and fix any constant α∈(0,1)\alpha\in(0,1)α∈(0,1). ==== For this instance (we already derived this earlier), there is a breakpoint ρ0(m)=1−α2m\rho_0(m)=\frac{1-\alpha}{2m}ρ0(m)=2m1−α such that for any ρ∈[ρ0(m), 1/m)\rho\in[\rho_0(m),\,1/m)ρ∈[ρ0(m),1/m) the optimizer has support S⋆={c}S^\star=\{c\}S⋆={c}, and the margin satisfies γ = 2α1+α (ρ−ρ0(m)).\gamma \;=\; \frac{2\alpha}{1+\alpha}\,(\rho-\rho_0(m)).γ=1+α2α(ρ−ρ0(m)). So γ\gammaγ is exactly linear in how far ρ\rhoρ is above the breakpoint. ==== Let’s define a sequence ρm\rho_mρm by choosing a tiny offset above the breakpoint, for example: ==== ρm:=ρ0(m)+ρ0(m)2.\rho_m := \rho_0(m) + \rho_0(m)^2.ρm:=ρ0(m)+ρ0(m)2. Then: # ρm→0\rho_m\to 0ρm→0 as m→∞m\to\inftym→∞ since ρ0(m)=Θ(1/m)\rho_0(m)=\Theta(1/m)ρ0(m)=Θ(1/m). # The margin is γm=2α1+α (ρm−ρ0(m))=2α1+α ρ0(m)2=Θ (1m2).\gamma_m = \frac{2\alpha}{1+\alpha}\,(\rho_m-\rho_0(m)) = \frac{2\alpha}{1+\alpha}\,\rho_0(m)^2 = \Theta\!\left(\frac{1}{m^2}\right).γm=1+α2α(ρm−ρ0(m))=1+α2αρ0(m)2=Θ(m21). # Meanwhile, ρm=Θ (1m).\sqrt{\rho_m} = \Theta\!\left(\frac{1}{\sqrt{m}}\right).ρm=Θ(m1). # Therefore γmρm=Θ (1/m21/m)=Θ (1m3/2)⟶0.\frac{\gamma_m}{\sqrt{\rho_m}} = \Theta\!\left(\frac{1/m^2}{1/\sqrt{m}}\right) = \Theta\!\left(\frac{1}{m^{3/2}}\right) \longrightarrow 0.ρmγm=Θ(1/m1/m2)=Θ(m3/21)⟶0. So indeed γm=o(ρm) \boxed{\;\gamma_m = o(\sqrt{\rho_m})\;}γm=o(ρm) for this explicit star-graph family. ==== Since ρm≈ρ0(m)\rho_m \approx \rho_0(m)ρm≈ρ0(m) for large mmm, and ρ0(m)=Θ(1/m)\rho_0(m)=\Theta(1/m)ρ0(m)=Θ(1/m), the choice ρm=ρ0+ρ02\rho_m=\rho_0+\rho_0^2ρm=ρ0+ρ02 gives ==== γm=Θ(ρ0(m)2)=Θ(ρm2).\gamma_m = \Theta(\rho_0(m)^2) = \Theta(\rho_m^2).γm=Θ(ρ0(m)2)=Θ(ρm2). And ρ2=o(ρ)\rho^2 = o(\sqrt{\rho})ρ2=o(ρ) as ρ→0\rho\to 0ρ→0 because ρ2ρ=ρ3/2→0.\frac{\rho^2}{\sqrt{\rho}} = \rho^{3/2}\to 0.ρρ2=ρ3/2→0. So this construction actually gives the very clean scaling: γ=Θ(ρ2)⇒γ=o(ρ). \boxed{\;\gamma = \Theta(\rho^2)\quad\Rightarrow\quad \gamma=o(\sqrt{\rho}).\;}γ=Θ(ρ2)⇒γ=o(ρ). ===== This shows you cannot hope for any worst-case lower bound like γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ (or even γ≳ρ\gamma \gtrsim \rhoγ≳ρ) that depends only on (α,ρ)(\alpha,\rho)(α,ρ): by tuning ρ\rhoρ close to a breakpoint on a suitable graph family, γ\gammaγ can be polynomially smaller than ρ\sqrt{\rho}ρ, even when the true support is tiny (here ∣S⋆∣=1≤1/ρ|S^\star|=1\le 1/\rho∣S⋆∣=1≤1/ρ). ===== If you want, I can also give a variant where γ\gammaγ is super-polynomially small relative to ρ\sqrt{\rho}ρ (e.g., choose ρ−ρ0\rho-\rho_0ρ−ρ0 exponentially small in mmm), but the γ=Θ(ρ2)\gamma=\Theta(\rho^2)γ=Θ(ρ2) example already proves γ=o(ρ)\gamma=o(\sqrt{\rho})γ=o(ρ) cleanly. === User: There is something that I don't understand. === There is something that I don't understand. Earlier you proved that "\begin{equation}\label{eq:workeps_ppr} \mathrm{Work}(N_\varepsilon) = O\!\left( \frac{1}{\rho\sqrt{\alpha}}\log\frac{1}{\varepsilon} \;+\; \frac{1}{\gamma^2\sqrt{\alpha}} \right), \end{equation}" So, based on what you said, for the star graph if ρ = ρ_0, thus γ=0. Does that mean that the algorithm needs infinite work? Is there something missing in your proof? === Assistant: No — γ0\gamma0γ0 does not mean the algorithm needs infinite work for any fixed accuracy ε0\varepsilon0ε0. === No — γ=0\gamma=0γ=0 does ''not'' mean the algorithm needs infinite work for any fixed accuracy ε>0\varepsilon>0ε>0. What it means is: the particular “local work” bound you’re quoting was proved only under the strict margin assumption γ>0\gamma>0γ>0, and if you plug in γ=0\gamma=0γ=0 you are applying the theorem outside its validity range. The “+∞+\infty+∞” you get is the bound becoming vacuous, not the algorithm becoming non-terminating. Here’s exactly what is going on. ==== In our derivation, the key counting step was: ==== * if an inactive coordinate i∈I⋆i\in I^\stari∈I⋆ becomes nonzero at an iterate, * then the prox input must cross the threshold by at least tγt\gammatγ, * which leads to a bound like ∣Ak∣ ≤ ∥u(yk)−u(x⋆)∥22t2γ2 ≤ (1+tLtγ)2∥yk−x⋆∥22.|A_k| \;\le\; \frac{\|u(y_k)-u(x^\star)\|_2^2}{t^2\gamma^2} \;\le\;\left(\frac{1+tL}{t\gamma}\right)^2\|y_k-x^\star\|_2^2.∣Ak∣≤t2γ2∥u(yk)−u(x⋆)∥22≤(tγ1+tL)2∥yk−x⋆∥22. That inequality divides by γ2\gamma^2γ2. If γ=0\gamma=0γ=0, the statement “each spurious active coordinate forces a jump of at least tγt\gammatγ” becomes “forces a jump of at least 0,” which gives no information. So the proof technique cannot upper bound the number of spurious activations, and the resulting constant 1/γ21/\gamma^21/γ2 blows up. So what fails at γ=0\gamma=0γ=0 is finite identification / stability of the optimal inactive set, not convergence of APG itself. ==== γ=0\gamma=0γ=0 means there exists an inactive coordinate i∈I⋆i\in I^\stari∈I⋆ such that the KKT inequality is tight: ==== ∣∇f(x⋆)i∣=λi.|\nabla f(x^\star)_i| = \lambda_i.∣∇f(x⋆)i∣=λi. Equivalently, at the optimum the prox input is exactly at the soft-threshold boundary: ∣u(x⋆)i∣=tλi,u(x):=x−t∇f(x).|u(x^\star)_i| = t\lambda_i,\quad u(x):=x-t\nabla f(x).∣u(x⋆)i∣=tλi,u(x):=x−t∇f(x). When you are exactly on that boundary, then arbitrarily small perturbations of yyy around x⋆x^\starx⋆ can flip that coordinate between * returning 0 (inactive), and * returning a tiny nonzero (active), because soft-thresholding switches at equality. Accelerated methods generate extrapolated points yky_kyk that can oscillate around x⋆x^\starx⋆, so there is no reason (in worst case) that such “tight” coordinates must become permanently zero after a finite time. In other words: * γ>0\gamma>0γ>0 ⇒ there is a buffer zone in which inactive coordinates are guaranteed to stay inactive, hence finite identification, hence the spurious-work term is a constant (independent of ε\varepsilonε). * γ=0\gamma=0γ=0 ⇒ there is no buffer zone, so spurious coordinates can keep flickering for as long as you run; the spurious work is no longer a constant independent of ε\varepsilonε. That’s exactly why the margin matters. ==== For any fixed ε>0\varepsilon>0ε>0: no. ==== Accelerated proximal gradient still has the iteration complexity Nε=O(1/αlog(1/ε))N_\varepsilon = O(\sqrt{1/\alpha}\log(1/\varepsilon))Nε=O(1/αlog(1/ε)) for strongly convex composite objectives, regardless of γ\gammaγ. This is the standard statement you quoted in the open-problem note. open-problem-fountoulakis22a What you lose at γ=0\gamma=0γ=0 is a guarantee that each iteration can be implemented with local cost O~(1/ρ)\tilde O(1/\rho)O~(1/ρ) (or that the cumulative support outside S⋆S^\starS⋆ is bounded by a constant). In the worst case you may have to fall back to the pessimistic bound you already mention in your note: Work(Nε) ≤ O (n1αlog1ε),\text{Work}(N_\varepsilon)\;\le\; O\!\left(n\sqrt{\frac{1}{\alpha}}\log\frac{1}{\varepsilon}\right),Work(Nε)≤O(nα1logε1), because without margin you cannot prove the iterate supports stay small. open-problem-fountoulakis22a So: finite work, but potentially non-local. ==== Two different “limits” get conflated: ==== ===== - You run Nε<∞N_\varepsilon<\inftyNε<∞ iterations. ===== * Work is at most nNε<∞nN_\varepsilon<\inftynNε<∞. * Nothing is infinite. ===== - Nε→∞N_\varepsilon\to\inftyNε→∞ for any method. ===== * If γ>0\gamma>0γ>0, we proved that the spurious support contribution stays bounded even as ε→0\varepsilon\to 0ε→0, because identification happens in finite time. * If γ=0\gamma=0γ=0, you can have no finite identification time, so spurious activations can persist for arbitrarily many iterations; the spurious contribution can grow with NεN_\varepsilonNε. In that sense, a bound that tries to be uniform “independent of ε\varepsilonε” will necessarily break. That is exactly what the 1γ2α\frac{1}{\gamma^2\sqrt{\alpha}}γ2α1 term is signaling. ==== Not a missing step — it’s a missing assumption. The bound you quoted is a conditional theorem: ==== * It is correct under Assumption γ>0\gamma>0γ>0. * At γ=0\gamma=0γ=0, that assumption is false, and the bound is not meant to apply. More strongly: the blow-up is not an artifact; it reflects a real phenomenon — at γ=0\gamma=0γ=0 you generally cannot guarantee finite-time identification or an ε\varepsilonε-independent bound on cumulative spurious support for inertial/accelerated proximal schemes. If you want, I can also show (still without LaTeX) how to state an alternative runtime bound valid even when γ=0\gamma=0γ=0: it will necessarily depend on nnn (or on how many “tight” inactive coordinates there are), matching the pessimistic O(n1/αlog(1/ε))O(n\sqrt{1/\alpha}\log(1/\varepsilon))O(n1/αlog(1/ε)) running time in your open-problem statement. open-problem-fountoulakis22a === User: Can you convert this into an independent latex section that I can append to my code? === Can you convert this into an independent latex section that I can append to my code? === Assistant: Append the following section it clarifies why plugging γ0\gamma0γ0 into the γ0\gamma0γ0 bound is invalid, and gives … === Append the following section (it clarifies why plugging γ=0\gamma=0γ=0 into the γ>0\gamma>0γ>0 bound is invalid, and gives the correct “fallback” runtime bound that always holds). open-problem-fountoulakis22a <syntaxhighlight lang="latex">% ============================================================ \section{When the margin vanishes: why the $1/\gamma^2$ term blows up (and what it does \emph{not} mean)} \label{sec:gamma_zero} A potential point of confusion is the appearance of the term $1/\gamma^2$ in bounds such as \begin{equation}\label{eq:workeps_ppr_repeat} \mathrm{Work}(N_\varepsilon) = O\!\left( \frac{1}{\rho\sqrt{\alpha}}\log\frac{1}{\varepsilon} \;+\; \frac{1}{\gamma^2\sqrt{\alpha}} \right), \end{equation} and what happens on instances where $\gamma=0$ (e.g., the star/path breakpoint examples). \subsection{The key point: \eqref{eq:workeps_ppr_repeat} is a conditional bound (it assumes $\gamma>0$)} Recall the definition of the (strict complementarity) margin: \[ \gamma := \min_{i\in I^\star}\left(\lambda_i - |\nabla f(x^\star)_i|\right), \qquad \lambda_i=\rho\alpha\sqrt{d_i}. \] All of the `<code>finite identification'' and </code><code>$\varepsilon$-independent spurious-support'' bounds were proved \emph{under the assumption $\gamma>0$}. The factor $1/\gamma^2$ arises by dividing by the minimum slack in the inactive KKT inequalities. If $\gamma=0$, then there exists an inactive index $i\in I^\star$ with \[ |\nabla f(x^\star)_i|=\lambda_i, \] i.e., that coordinate lies \emph{exactly} on the soft-thresholding boundary at the optimum. In this case, there is no neighborhood of $x^\star$ in which the proximal-gradient update is guaranteed to keep all inactive coordinates at zero: arbitrarily small perturbations of the extrapolated point $y$ can flip such a tight coordinate between $0$ and a tiny nonzero value. Consequently, one cannot in general guarantee \emph{finite-time identification} of the optimal inactive set, and the proofs yielding $\varepsilon$-independent spurious-support bounds do not apply. \paragraph{Conclusion.} Setting $\gamma=0$ in \eqref{eq:workeps_ppr_repeat} does \emph{not} imply the algorithm requires infinite work; it only means that \eqref{eq:workeps_ppr_repeat} becomes \emph{vacuous} when used outside its assumptions. \subsection{What remains true when $\gamma=0$: finite work for any fixed accuracy} Even when $\gamma=0$, accelerated proximal-gradient methods still enjoy the standard accelerated \emph{iteration} complexity for strongly convex composite optimization. In particular, under $\mu$-strong convexity of $F$ and $L$-smoothness of $f$, there exists $q\in(0,1)$ (typically $q\approx 1-\sqrt{\mu/L}$) such that \[ F(x_k)-F(x^\star)\le \Delta_0\,q^k, \qquad \Delta_0:=F(x_0)-F(x^\star). \] Therefore, for any target $\varepsilon>0$, an $\varepsilon$-accurate iterate is obtained after \[ N_\varepsilon = O\!\left(\frac{\log(\Delta_0/\varepsilon)}{\log(1/q)}\right) = O\!\left(\sqrt{\frac{L}{\mu}}\log\frac{\Delta_0}{\varepsilon}\right) \] iterations. However, without a positive margin, we cannot bound the supports during the transient independently of $n$. A worst-case bound is obtained by charging each iteration the full dimension. \begin{corollary}[Always-valid fallback work bound (no margin needed)]\label{cor:fallback_work} Assume only that $f$ is $L$-smooth and $F=f+g$ is $\mu$-strongly convex, and run an accelerated proximal-gradient method for $N_\varepsilon$ iterations. Then for any $\varepsilon>0$, \[ \mathrm{Work}(N_\varepsilon) \;\le\; O\!\left(n\,N_\varepsilon\right) = O\!\left(n\sqrt{\frac{L}{\mu}}\log\frac{\Delta_0}{\varepsilon}\right), \] independent of the margin $\gamma$. In the $\ell_1$-regularized PageRank setting (undirected graph), $L\le 1$ and $\mu=\alpha$, hence \[ \mathrm{Work}(N_\varepsilon) = O\!\left(\frac{n}{\sqrt{\alpha}}\log\frac{\Delta_0}{\varepsilon}\right). \] \end{corollary} \begin{remark}[How to interpret the role of $\gamma$] The margin $\gamma>0$ is precisely what upgrades the fallback bound above (which scales with $n$) to a \emph{local} bound that depends on $1/\rho$ and is independent of $n$. When $\gamma=0$, locality cannot be certified by this approach: tight inactive coordinates can keep </code>`flickering'' and may prevent $\varepsilon$-independent bounds on the cumulative spurious supports. % ============================================================ </syntaxhighlight> === User: Is there any reasonable assumption that we could make about the structure of the graph around the seed node and ρ such … === Is there any reasonable assumption that we could make about the structure of the graph around the seed node and ρ such that γ is bigger than sqrt(ρ) or of that order? === Assistant: There are two very different issues mixed into “can we assume γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ?”: === There are two very different issues mixed into “can we assume γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ?”: # A hard upper bound coming from the penalty weights in (3), and # A nondegeneracy / “don’t sit at a breakpoint” condition (strict complementarity). Let me separate them. ==== For your objective (3), the ℓ1\ell_1ℓ1 weights are ==== λi = ραdi.('')\lambda_i \;=\; \rho\alpha\sqrt{d_i}. \tag{''}λi=ραdi.(*) and for every inactive coordinate i∈I⋆i\in I^\stari∈I⋆, ∣∇f(x⋆)i∣≤λi,γ=mini∈I⋆(λi−∣∇f(x⋆)i∣).(''')|\nabla f(x^\star)_i|\le \lambda_i, \qquad \gamma = \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr). \tag{'''}∣∇f(x⋆)i∣≤λi,γ=i∈I⋆min(λi−∣∇f(x⋆)i∣).(**) So automatically 0≤γ≤mini∈I⋆λi≤ραmini∈I⋆di≤ραdmin,(1)0 \le \gamma \le \min_{i\in I^\star}\lambda_i \le \rho\alpha\min_{i\in I^\star}\sqrt{d_i} \le \rho\alpha\sqrt{d_{\min}}, \tag{1}0≤γ≤i∈I⋆minλi≤ραi∈I⋆mindi≤ραdmin,(1) where dmind_{\min}dmin is the global minimum degree (or just the minimum degree among inactive nodes). This is immediate from the definition. open-problem-fountoulakis22a ===== If your graph has any inactive node with bounded degree (say degree 1 or 2), then ===== γ≤O(ρ).\gamma \le O(\rho).γ≤O(ρ). And when ρ→0\rho\to 0ρ→0, O(ρ)=o(ρ)O(\rho)=o(\sqrt{\rho})O(ρ)=o(ρ). So on typical sparse graphs with leaves/whiskers, no structural assumption “around the seed” can force γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ, because the bound (1) is global and purely degree-based. This is why the star/path families are so effective as counterexamples: they contain low-degree inactive nodes, forcing γ\gammaγ to be at most O(ρ)O(\rho)O(ρ), and (by tuning ρ\rhoρ near breakpoints) even smaller. ==== To even make γ∼ρ\gamma\sim \sqrt{\rho}γ∼ρ possible when ρ\rhoρ is small, you need λi\lambda_iλi itself to be Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ) for the relevant inactive coordinates. Using (∗)(*)(∗), that means: ==== ραdi ≳ ρ⟹di ≳ 1α2ρ.(2)\rho\alpha\sqrt{d_i} \;\gtrsim\; \sqrt{\rho} \quad\Longrightarrow\quad d_i \;\gtrsim\; \frac{1}{\alpha^2\rho}. \tag{2}ραdi≳ρ⟹di≳α2ρ1.(2) So a necessary (and essentially unavoidable) type of assumption is: ===== A clean version is: ===== * either the graph has minimum degree dmin≳1/(α2ρ)d_{\min} \gtrsim 1/(\alpha^2\rho)dmin≳1/(α2ρ), * or (more weakly) every node that ends up inactive has degree at least c/(α2ρ)c/(\alpha^2\rho)c/(α2ρ). This rules out leaves/paths/stars-with-leaves-outside-support, and it matches the algebra: then mini∈I⋆λi ≥ ραc/(α2ρ) = cρ.\min_{i\in I^\star}\lambda_i \;\ge\; \rho\alpha\sqrt{c/(\alpha^2\rho)} \;=\; \sqrt{c\rho}.i∈I⋆minλi≥ραc/(α2ρ)=cρ. Is this “reasonable”? * On dense graphs, yes. * On many sparse real-world networks, no—unless you pre-process (e.g., work on a core where minimum degree is large). ==== Even if λi\lambda_iλi is large, γ\gammaγ can still be tiny if some inactive coordinate is nearly tight: ==== ∣∇f(x⋆)i∣≈λi.|\nabla f(x^\star)_i|\approx \lambda_i.∣∇f(x⋆)i∣≈λi. That’s exactly what happens at (or near) breakpoints of the regularization path. So you need a second assumption: ===== A natural version is a relative slack condition: ===== ∣∇f(x⋆)i∣≤(1−η)λifor all i∈I⋆,|\nabla f(x^\star)_i| \le (1-\eta)\lambda_i \quad\text{for all } i\in I^\star,∣∇f(x⋆)i∣≤(1−η)λifor all i∈I⋆, for some fixed η∈(0,1)\eta\in(0,1)η∈(0,1). This is stronger than just γ>0\gamma>0γ>0; it forces a uniform fraction of slack. Then γ=mini∈I⋆(λi−∣∇f(x⋆)i∣)≥ηmini∈I⋆λi.(3)\gamma = \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) \ge \eta \min_{i\in I^\star}\lambda_i. \tag{3}γ=i∈I⋆min(λi−∣∇f(x⋆)i∣)≥ηi∈I⋆minλi.(3) Combine (3) with the degree condition (2) and you get γ ≳ ηρ(up to α-constants).(4)\gamma \;\gtrsim\; \eta \sqrt{\rho} \quad (\text{up to }\alpha\text{-constants}). \tag{4}γ≳ηρ(up to α-constants).(4) Is Assumption B “reasonable”? * Deterministically in worst case: no (adversarial ρ\rhoρ can always sit near a breakpoint; your star/path show this). * Practically / statistically: often yes, because exact tightness is a codimension-1 event. If ρ\rhoρ is chosen from a coarse grid or with small random jitter, hitting a breakpoint “too closely” becomes unlikely. A very concrete “algorithmic” way to justify B is: * choose ρ\rhoρ on a multiplicative grid (e.g., ρ=2−j\rho=2^{-j}ρ=2−j) and/or * add a tiny random perturbation to ρ\rhoρ (or to edge weights), then with high probability you are not extremely close to a breakpoint, hence γ\gammaγ is not extremely small. ==== ### ==== Then γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ is not a reasonable target in general, because γ≤O(ρ)\gamma\le O(\rho)γ≤O(ρ) already from (1). The best you can hope for is γ=Θ(ρ)\gamma=\Theta(\rho)γ=Θ(ρ) (and even that needs a nondegeneracy buffer). ===== If you assume something like: ===== * all nodes outside the seed’s local region / outside the optimal support have degree ≳1/ρ\gtrsim 1/\rho≳1/ρ, and * ρ\rhoρ is not tuned near a breakpoint (relative slack η\etaη), then γ\gammaγ can be of order ρ\sqrt{\rho}ρ (more precisely Θ(αρ)\Theta(\alpha\sqrt{\rho})Θ(αρ) up to constants), by the simple chain (3)+(2). If you tell me which regime you care about (typical local clustering on sparse graphs, or a dense-core model, or a probabilistic “ρ\rhoρ not adversarial” model), I can write down a crisp assumption set and show the exact inequality chain that yields γ≥c αρ\gamma \ge c\,\alpha\sqrt{\rho}γ≥cαρ (or explain why it’s impossible in that regime). === User: Can you provide a theorem for a sufficient condition for γ to be Osqrtρ. === Can you provide a theorem for a sufficient condition for γ to be O(sqrt(ρ)). You will possibly need to assume a target cluster with some conductance which is also well connected internally. === Assistant: Yes—there’s a clean way to package this as a primal–dual KKT certificate theorem plus a graph-structural corollary. === Yes—there’s a clean way to package this as a primal–dual (KKT certificate) theorem plus a graph-structural corollary. The main idea is: * pick a “target cluster” CCC containing the seed, * show the restricted minimizer supported on CCC is also the global minimizer (via strict dual feasibility on V∖CV\setminus CV∖C), * then the margin is at least the dual-feasibility slack, which you can lower bound by something like Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ) if (i) outside degrees are large enough so λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi=ραdi is ≳ρ\gtrsim \sqrt{\rho}≳ρ, and (ii) boundary coupling is small (low conductance / dispersed boundary) and CCC is internally well-conditioned. I’ll write a theorem in a form that is actually checkable and then interpret it in cluster terms. ==== We’re in the ℓ1\ell_1ℓ1-regularized PPR problem (your eq. (3)) open-problem-fountoulakis22a ==== : minx∈Rn F(x)=f(x)+g(x),f(x)=12x⊤Qx−αx⊤D−1/2s,g(x)=∑i=1nλi∣xi∣,\min_{x\in\mathbb{R}^n}\;F(x)= f(x)+g(x),\quad f(x)=\tfrac12 x^\top Qx-\alpha x^\top D^{-1/2}s,\quad g(x)=\sum_{i=1}^n \lambda_i |x_i|,x∈RnminF(x)=f(x)+g(x),f(x)=21x⊤Qx−αx⊤D−1/2s,g(x)=i=1∑nλi∣xi∣, with Q=αI+1−α2L,λi=ραdi,b:=αD−1/2s.Q=\alpha I+\tfrac{1-\alpha}{2}L,\qquad \lambda_i=\rho\alpha\sqrt{d_i},\qquad b:=\alpha D^{-1/2}s.Q=αI+21−αL,λi=ραdi,b:=αD−1/2s. Let x⋆x^\starx⋆ be the unique minimizer and I⋆={i:xi⋆=0}I^\star=\{i:x_i^\star=0\}I⋆={i:xi⋆=0}. The margin is γ:=mini∈I⋆(λi−∣∇f(x⋆)i∣)=mini∈I⋆(λi−∣(Qx⋆−b)i∣).\gamma := \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) = \min_{i\in I^\star}\bigl(\lambda_i-| (Qx^\star-b)_i|\bigr).γ:=i∈I⋆min(λi−∣∇f(x⋆)i∣)=i∈I⋆min(λi−∣(Qx⋆−b)i∣). ==== ### ==== Let C⊆VC\subseteq VC⊆V contain the seed node vvv (i.e. s=evs=e_vs=ev and v∈Cv\in Cv∈C). Define the complement Cˉ:=V∖C\bar C:=V\setminus CCˉ:=V∖C. Partition QQQ and bbb into blocks (C,Cˉ)(C,\bar C)(C,Cˉ). Assume there exists a vector x^∈Rn\hat x\in\mathbb{R}^nx^∈Rn supported on CCC (i.e. x^Cˉ=0\hat x_{\bar C}=0x^Cˉ=0) such that: # Stationarity on CCC with a known sign pattern. There is a sign vector σC∈{−1,+1}∣C∣\sigma_C\in\{-1,+1\}^{|C|}σC∈{−1,+1}∣C∣ such that QCCx^C−bC+λC∘σC=0.Q_{CC}\hat x_C - b_C + \lambda_C\circ \sigma_C = 0.QCCx^C−bC+λC∘σC=0. (In the PPR setting one often expects x^C≥0\hat x_C\ge 0x^C≥0, so σC=1\sigma_C=\mathbf{1}σC=1.) # Strict dual feasibility off CCC (a uniform slack η∈(0,1)\eta\in(0,1)η∈(0,1)). For all i∈Cˉi\in\bar Ci∈Cˉ, ∣(QCˉCx^C−bCˉ)i∣≤(1−η)λi.|(Q_{\bar C C}\hat x_C - b_{\bar C})_i| \le (1-\eta)\lambda_i.∣(QCˉCx^C−bCˉ)i∣≤(1−η)λi. Then: * x^\hat xx^ is the unique global minimizer: x^=x⋆\hat x=x^\starx^=x⋆, hence \supp(x⋆)⊆C\supp(x^\star)\subseteq C\supp(x⋆)⊆C. * The margin satisfies the explicit lower bound γ ≥ ηmini∈Cˉλi = η ραmini∈Cˉdi.\gamma \;\ge\; \eta\min_{i\in\bar C}\lambda_i \;=\; \eta\,\rho\alpha\min_{i\in\bar C}\sqrt{d_i}.γ≥ηi∈Cˉminλi=ηραi∈Cˉmindi. In particular, if the outside minimum degree satisfies mini∈Cˉdi ≥ c0α2ρfor some c0>0,\min_{i\in\bar C} d_i \;\ge\; \frac{c_0}{\alpha^2\rho} \quad\text{for some }c_0>0,i∈Cˉmindi≥α2ρc0for some c0>0, then γ ≥ η ραc0α2ρ = ηc0 ρ = Ω(ρ).\gamma \;\ge\; \eta\,\rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}} \;=\;\eta\sqrt{c_0}\,\sqrt{\rho} \;=\;\Omega(\sqrt{\rho}).γ≥ηραα2ρc0=ηc0ρ=Ω(ρ). Why this theorem is the right abstraction. It’s the exact analogue of a standard Lasso “primal–dual witness” / strict dual feasibility condition: once you have strict feasibility on the complement, the support is stable and you get a quantitative margin. ==== The theorem above is “certificate-based.” To connect it to conductance and “internally well connected,” you want easy-to-check conditions that imply the strict dual feasibility inequality. ==== Here is a sufficient (strong but explicit) set of structural assumptions that imply the certificate. ===== Let C⊆VC\subseteq VC⊆V contain the seed vvv, and suppose: ===== (A) Internal connectivity / conditioning on CCC. The principal submatrix QCCQ_{CC}QCC satisfies λmin(QCC)≥μC,\lambda_{\min}(Q_{CC}) \ge \mu_C,λmin(QCC)≥μC, for some μC>0\mu_C>0μC>0. A natural way to ensure this is to assume CCC has a positive Dirichlet spectral gap for the normalized Laplacian (or, more informally, CCC is an internal expander); then μC\mu_CμC is bounded below by α+1−α2λmin(LCD)\alpha+\tfrac{1-\alpha}{2}\lambda_{\min}(L^D_C)α+21−αλmin(LCD). (B) Low external conductance and “dispersed boundary.” Let cut(C,Cˉ)\mathrm{cut}(C,\bar C)cut(C,Cˉ) be the number of edges crossing (C,Cˉ)(C,\bar C)(C,Cˉ). Assume ϕ(C)=cut(C,Cˉ)vol(C)≤coutρandmaxi∈CˉdegC(i)≤b,\phi(C)=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)} \le c_{\mathrm{out}}\rho \quad\text{and}\quad \max_{i\in\bar C}\deg_C(i)\le b,ϕ(C)=vol(C)cut(C,Cˉ)≤coutρandi∈CˉmaxdegC(i)≤b, where degC(i)\deg_C(i)degC(i) is the number of neighbors of iii inside CCC, and bbb is a modest constant. (Heuristically: not only is the cut small, it’s not concentrated onto a single outside low-degree node.) (C) Degree lower bounds. dj≥dC,min ∀j∈C,di≥dout ∀i∈Cˉ,dout≥c0α2ρ.d_j \ge d_{C,\min}\ \ \forall j\in C, \qquad d_i \ge d_{\mathrm{out}}\ \ \forall i\in\bar C, \qquad d_{\mathrm{out}} \ge \frac{c_0}{\alpha^2\rho}.dj≥dC,min ∀j∈C,di≥dout ∀i∈Cˉ,dout≥α2ρc0. (D) Choose ρ\rhoρ so the restricted solution is nonnegative (and not at a breakpoint). Define the candidate supported on CCC (assuming positivity) x^C:=QCC−1(bC−λC),x^Cˉ:=0,\hat x_C := Q_{CC}^{-1}(b_C-\lambda_C), \qquad \hat x_{\bar C}:=0,x^C:=QCC−1(bC−λC),x^Cˉ:=0, and assume x^C≥0\hat x_C\ge 0x^C≥0 componentwise, and that the following “boundary domination” inequality holds for some η∈(0,1)\eta\in(0,1)η∈(0,1): 1−α2bdoutdC,min⏟(boundary coupling)⋅∥x^C∥2⏟(signal on C) ≤ (1−η)ραdout⏟mini∈Cˉλi.\underbrace{\frac{1-\alpha}{2}\sqrt{\frac{b}{d_{\mathrm{out}}d_{C,\min}}}}_{\text{(boundary coupling)}} \cdot \underbrace{\|\hat x_C\|_2}_{\text{(signal on }C\text{)}} \ \le\ (1-\eta)\underbrace{\rho\alpha\sqrt{d_{\mathrm{out}}}}_{\min_{i\in\bar C}\lambda_i}.(boundary coupling)21−αdoutdC,minb⋅(signal on C)∥x^C∥2 ≤ (1−η)mini∈Cˉλiραdout. Then the strict dual feasibility condition holds, so x^=x⋆\hat x=x^\starx^=x⋆ and γ≥η ραdout≥ηc0ρ.\gamma \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}} \ge \eta\sqrt{c_0}\sqrt{\rho}.γ≥ηραdout≥ηc0ρ. Interpretation. * Low conductance keeps the interaction across (C,Cˉ)(C,\bar C)(C,Cˉ) limited. * Internal connectivity keeps QCCQ_{CC}QCC well-conditioned, preventing ∥x^C∥\|\hat x_C\|∥x^C∥ from blowing up. * High degree outside ensures the thresholds λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi=ραdi are at least Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ) (this is necessary for γ\gammaγ to be Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ), because γ≤mini∈I⋆λi\gamma\le \min_{i\in I^\star}\lambda_iγ≤mini∈I⋆λi). * Dispersed boundary prevents a single outside vertex from having a large coupling QiCQ_{iC}QiC. ==== - This kind of theorem is plausible and standard (it’s exactly the strict dual feasibility logic), but it does require assumptions that are stronger than just “small conductance”. In particular, without something like “no low-degree boundary whiskers” (or at least “inactive nodes have large degree”), you cannot hope for γ≳ρ\gamma\gtrsim \sqrt{\rho}γ≳ρ, because γ≤mini∈I⋆λi=ραmini∈I⋆di\gamma \le \min_{i\in I^\star}\lambda_i = \rho\alpha\min_{i\in I^\star}\sqrt{d_i}γ≤mini∈I⋆λi=ραmini∈I⋆di. ==== * If you’re thinking in the usual sparse-graph local clustering regime (many degree-1/2 nodes), the best “natural” order you can hope for is more like γ=Θ(ρ)\gamma=\Theta(\rho)γ=Θ(ρ) (up to constants), not Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ). If you want, I can convert the theorem + corollary into a compact LaTeX section and align notation precisely with your earlier appendices (including a short proof: it’s a one-page KKT/primal–dual witness argument). === User: Excellent, can you provide a latex section with this sufficient theorem and the corresponding proof? === Excellent, can you provide a latex section with this sufficient theorem and the corresponding proof? Don't skip steps, provide all details. === Assistant: Below is an appendable LaTeX section giving a sufficient cluster certificate theorem and a fully detailed proof that it… === Below is an appendable LaTeX section giving a sufficient cluster certificate theorem and a fully detailed proof that it implies (i) the global optimizer is supported on the cluster and (ii) the margin satisfies γ=Ω(ρ)\gamma=\Omega(\sqrt{\rho})γ=Ω(ρ) under a simple outside-degree condition. It is written for the ℓ1\ell_1ℓ1-regularized PageRank objective (3) in your COLT’22 open-problem note. open-problem-fountoulakis22a <syntaxhighlight lang="latex">% ============================================================ \section{A primal--dual cluster certificate implying a $\sqrt{\rho}$-scale margin} \label{sec:cluster_certificate_margin} This section provides a \emph{sufficient} condition ensuring the strict-complementarity margin \[ \gamma := \min_{i\in I^\star}\left(\lambda_i - |\nabla f(x^\star)_i|\right) \] is bounded below by a quantity of order $\sqrt{\rho}$ (in the PageRank specialization). The result is stated and proved in a general composite form and then specialized to the $\ell_1$-regularized PageRank objective~(3). \subsection{General composite model and subgradients} Consider the composite optimization problem \begin{equation}\label{eq:comp_model_cert} \min_{x\in\mathbb{R}^n} F(x) := f(x) + g(x), \qquad g(x) := \sum_{i=1}^n \lambda_i |x_i|, \quad \lambda_i>0. \end{equation} Assume $f$ is convex and differentiable. (In our PageRank case, $f$ is quadratic and strongly convex.) We recall the coordinatewise subdifferential of the weighted $\ell_1$ term. \begin{lemma}[Subdifferential of weighted $\ell_1$]\label{lem:subdiff_l1_weighted} Let $g(x)=\sum_{i=1}^n \lambda_i |x_i|$ with $\lambda_i>0$. Then \[ \partial g(x) = \left\{z\in\mathbb{R}^n:\; z_i= \begin{cases} \lambda_i\,\mathrm{sign}(x_i), & x_i\neq 0,\\[2mm] \theta_i,\;\; \theta_i\in[-\lambda_i,\lambda_i], & x_i=0, \end{cases} \;\; \forall i \right\}. \] \end{lemma} \begin{proof} Since $g$ is separable, $g(x)=\sum_i g_i(x_i)$ with $g_i(t)=\lambda_i|t|$. The subdifferential of $|t|$ is $\partial |t|=\{\mathrm{sign}(t)\}$ for $t\neq 0$ and $\partial |0|=[-1,1]$. Thus $\partial g_i(t)=\lambda_i\partial|t|$, yielding $\partial g_i(t)=\{\lambda_i\mathrm{sign}(t)\}$ for $t\neq 0$ and $\partial g_i(0)=[-\lambda_i,\lambda_i]$. Taking the Cartesian product across coordinates gives the stated formula. \end{proof} We also recall the standard first-order optimality characterization for convex minimization. \begin{lemma}[First-order optimality for convex composite problems]\label{lem:optimality_convex} Assume $f$ is convex and differentiable and $g$ is convex. Then a point $\bar x$ minimizes $F=f+g$ if and only if \[ 0 \in \partial F(\bar x). \] Moreover, since $f$ is differentiable, $\partial f(\bar x)=\{\nabla f(\bar x)\}$ and hence \[ \partial F(\bar x)=\nabla f(\bar x)+\partial g(\bar x), \] so $\bar x$ is a minimizer if and only if there exists $z\in\partial g(\bar x)$ such that \begin{equation}\label{eq:KKT_general} \nabla f(\bar x)+z=0. \end{equation} \end{lemma} \begin{proof} For any proper closed convex function $h$, a point $\bar x$ minimizes $h$ if and only if $0\in\partial h(\bar x)$. Apply this to $h=F$. Next, since $f$ is convex and differentiable, $\partial f(\bar x)=\{\nabla f(\bar x)\}$. A standard subgradient sum rule gives $\partial (f+g)(\bar x)=\partial f(\bar x)+\partial g(\bar x)$, hence $\partial F(\bar x)=\nabla f(\bar x)+\partial g(\bar x)$. Therefore $0\in\partial F(\bar x)$ is equivalent to the existence of $z\in\partial g(\bar x)$ satisfying \eqref{eq:KKT_general}. \end{proof} \subsection{A cluster (support) certificate and a margin lower bound} Let $C\subseteq\{1,\dots,n\}$ be a candidate ``cluster'' and let $\bar C:=\{1,\dots,n\}\setminus C$. For vectors we use the block notation $x=(x_C,x_{\bar C})$. \begin{theorem}[Primal--dual cluster certificate $\Rightarrow$ support containment and margin]\label{thm:cluster_certificate} Consider \eqref{eq:comp_model_cert} with convex differentiable $f$ and weights $\lambda_i>0$. Assume in addition that $F=f+g$ is $\mu$-strongly convex for some $\mu>0$, so the minimizer $x^\star$ is unique. Fix a set $C\subseteq\{1,\dots,n\}$ and suppose there exist: \begin{itemize} \item a vector $\hat x\in\mathbb{R}^n$ with \emph{exact} support $\,\supp(\hat x)=C$ (i.e.\ $\hat x_C$ has no zero entries and $\hat x_{\bar C}=0$), and \item a parameter $\eta\in(0,1)$, \end{itemize} such that the following two conditions hold: \begin{enumerate} \item[\textbf{(i)}] \textbf{Stationarity on $C$.} For all $i\in C$, \begin{equation}\label{eq:stationarity_C} \nabla f(\hat x)_i + \lambda_i\,\mathrm{sign}(\hat x_i)=0. \end{equation} \item[\textbf{(ii)}] \textbf{Strict dual feasibility on $\bar C$.} For all $i\in \bar C$, \begin{equation}\label{eq:strict_dual_feas} |\nabla f(\hat x)_i| \le (1-\eta)\lambda_i. \end{equation} \end{enumerate} Then: \begin{enumerate} \item[(a)] $\hat x$ is the unique global minimizer: $\hat x=x^\star$. Hence $\supp(x^\star)=C$. \item[(b)] The strict-complementarity margin satisfies \begin{equation}\label{eq:gamma_lower_bound_cert} \gamma := \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) \;\ge\; \eta\,\min_{i\in \bar C}\lambda_i. \end{equation} \end{enumerate} \end{theorem} \begin{proof} \paragraph{Step 1: Construct an explicit subgradient $z\in\partial g(\hat x)$.} Define $z\in\mathbb{R}^n$ coordinatewise by \[ z_i := \begin{cases} \lambda_i\,\mathrm{sign}(\hat x_i), & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$. Indeed, since $\supp(\hat x)=C$, we have $\hat x_i\neq 0$ for all $i\in C$ and $\hat x_i=0$ for all $i\in\bar C$. By Lemma~\ref{lem:subdiff_l1_weighted}: \begin{itemize} \item For $i\in C$ (nonzero coordinates), the subdifferential requires $z_i=\lambda_i\mathrm{sign}(\hat x_i)$, which holds by definition. \item For $i\in\bar C$ (zero coordinates), the subdifferential requires $z_i\in[-\lambda_i,\lambda_i]$. But by \eqref{eq:strict_dual_feas} and our definition, $|z_i|=|\nabla f(\hat x)_i|\le (1-\eta)\lambda_i<\lambda_i$, so indeed $z_i\in[-\lambda_i,\lambda_i]$. \end{itemize} Thus $z\in\partial g(\hat x)$. \paragraph{Step 2: Verify the KKT condition $\nabla f(\hat x)+z=0$.} We check this coordinatewise. \begin{itemize} \item For $i\in C$, $z_i=\lambda_i\mathrm{sign}(\hat x_i)$ and \eqref{eq:stationarity_C} gives $\nabla f(\hat x)_i+z_i=0$. \item For $i\in\bar C$, we defined $z_i=-\nabla f(\hat x)_i$, hence $\nabla f(\hat x)_i+z_i=0$ holds trivially. \end{itemize} Therefore $\nabla f(\hat x)+z=0$ with $z\in\partial g(\hat x)$. \paragraph{Step 3: Conclude $\hat x$ is a global minimizer.} By Lemma~\ref{lem:optimality_convex}, the existence of $z\in\partial g(\hat x)$ such that $\nabla f(\hat x)+z=0$ implies $0\in\partial F(\hat x)$ and hence $\hat x$ minimizes $F$. \paragraph{Step 4: Conclude uniqueness and identify the support.} By assumption, $F$ is $\mu$-strongly convex with $\mu>0$, hence the minimizer is unique. Since $\hat x$ is a minimizer, it must equal the unique minimizer: $\hat x=x^\star$. In particular, $\supp(x^\star)=\supp(\hat x)=C$ and thus $I^\star=\bar C$. \paragraph{Step 5: Lower bound the margin.} Since $I^\star=\bar C$, we can write \[ \gamma = \min_{i\in\bar C}\left(\lambda_i-|\nabla f(x^\star)_i|\right) = \min_{i\in\bar C}\left(\lambda_i-|\nabla f(\hat x)_i|\right), \] where the equality uses $x^\star=\hat x$. Using \eqref{eq:strict_dual_feas}, for each $i\in\bar C$ we have \[ \lambda_i-|\nabla f(\hat x)_i| \ge \lambda_i-(1-\eta)\lambda_i = \eta\lambda_i. \] Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_lower_bound_cert}. \end{proof} \subsection{Specialization to $\ell_1$-regularized PageRank and a $\sqrt{\rho}$ condition} We now specialize Theorem~\ref{thm:cluster_certificate} to the $\ell_1$-regularized PageRank objective~(3): \[ F(x)= f(x)+g(x),\qquad f(x)=\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s,\qquad g(x)=\rho\alpha\|D^{1/2}x\|_1. \] In this case $g(x)=\sum_{i=1}^n \lambda_i|x_i|$ with weights \begin{equation}\label{eq:lambda_ppr} \lambda_i=\rho\alpha\sqrt{d_i}. \end{equation} \begin{corollary}[$\gamma=\Omega(\sqrt{\rho})$ under a degree condition]\label{cor:gamma_sqrt_rho} Assume the PageRank objective~(3) and suppose the hypotheses of Theorem~\ref{thm:cluster_certificate} hold for some set $C$ and slack $\eta\in(0,1)$. Then \[ \gamma \ge \eta\,\min_{i\in\bar C}\lambda_i = \eta\,\rho\alpha\,\min_{i\in\bar C}\sqrt{d_i}. \] In particular, if the complement degrees satisfy \begin{equation}\label{eq:degree_condition_out} \min_{i\in\bar C} d_i \;\ge\; \frac{c_0}{\alpha^2\rho} \qquad\text{for some constant }c_0>0, \end{equation} then \begin{equation}\label{eq:gamma_Omega_sqrt_rho} \gamma \;\ge\; \eta\,\sqrt{c_0}\,\sqrt{\rho}. \end{equation} \end{corollary} \begin{proof} The first inequality follows immediately by substituting \eqref{eq:lambda_ppr} into \eqref{eq:gamma_lower_bound_cert}. If additionally \eqref{eq:degree_condition_out} holds, then for all $i\in\bar C$, \[ \lambda_i=\rho\alpha\sqrt{d_i}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}. \] Taking the minimum over $i\in\bar C$ and multiplying by $\eta$ yields \eqref{eq:gamma_Omega_sqrt_rho}. \end{proof} \begin{remark}[Relation to conductance and internal connectivity] Theorem~\ref{thm:cluster_certificate} isolates the core requirement: a \emph{strict dual feasibility} inequality \eqref{eq:strict_dual_feas} on $\bar C$. In the PageRank setting, $\nabla f(\hat x)=Q\hat x-\alpha D^{-1/2}s$ and the block $Q_{\bar C C}$ only involves edges crossing the cut $(C,\bar C)$. Thus, when $C$ has small boundary (low conductance) and is internally well-conditioned (so the restricted solution on $C$ does not become too large), one expects \eqref{eq:strict_dual_feas} to hold with a nontrivial slack $\eta$. The degree condition \eqref{eq:degree_condition_out} is a simple sufficient way to ensure $\min_{i\in\bar C}\lambda_i=\Theta(\sqrt{\rho})$, which is necessary for $\gamma$ to be of order $\sqrt{\rho}$. \end{remark} % ============================================================ </syntaxhighlight> === User: fix the bugs "This is pdfTeX, Version 3.141592653-2.6-1.40.27 TeX Live 2025 preloaded formatpdflatex 2025.10.27 15 DE… === fix the bugs "This is pdfTeX, Version 3.141592653-2.6-1.40.27 (TeX Live 2025) (preloaded format=pdflatex 2025.10.27) 15 DEC 2025 16:06 entering extended mode \write18 enabled. %&-line parsing enabled. **00_main.tex (./00_main.tex LaTeX2e <2025-06-01> patch level 1 L3 programming layer <2025-05-26> (/usr/local/texlive/2025/texmf-dist/tex/latex/base/article.cls Document Class: article 2025/01/22 v1.4n Standard LaTeX document class (/usr/local/texlive/2025/texmf-dist/tex/latex/base/size11.clo File: size11.clo 2025/01/22 v1.4n Standard LaTeX file (size option) ) \c@part=\count275 \c@section=\count276 \c@subsection=\count277 \c@subsubsection=\count278 \c@paragraph=\count279 \c@subparagraph=\count280 \c@figure=\count281 \c@table=\count282 \abovecaptionskip=\skip49 \belowcaptionskip=\skip50 \bibindent=\dimen148 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsmath.sty Package: amsmath 2025/05/18 v2.17x AMS math features \@mathmargin=\skip51 For additional information on amsmath, use the ?' option. (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amstext.sty Package: amstext 2024/11/17 v2.01 AMS text (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsgen.sty File: amsgen.sty 1999/11/30 v2.0 generic functions \@emptytoks=\toks17 \ex@=\dimen149 )) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsbsy.sty Package: amsbsy 1999/11/29 v1.2d Bold Symbols \pmbraise@=\dimen150 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsopn.sty Package: amsopn 2022/04/08 v2.04 operator names ) \inf@bad=\count283 LaTeX Info: Redefining \frac on input line 233. \uproot@=\count284 \leftroot@=\count285 LaTeX Info: Redefining \overline on input line 398. LaTeX Info: Redefining \colon on input line 409. \classnum@=\count286 \DOTSCASE@=\count287 LaTeX Info: Redefining \ldots on input line 495. LaTeX Info: Redefining \dots on input line 498. LaTeX Info: Redefining \cdots on input line 619. \Mathstrutbox@=\box53 \strutbox@=\box54 LaTeX Info: Redefining \big on input line 721. LaTeX Info: Redefining \Big on input line 722. LaTeX Info: Redefining \bigg on input line 723. LaTeX Info: Redefining \Bigg on input line 724. \big@size=\dimen151 LaTeX Font Info: Redeclaring font encoding OML on input line 742. LaTeX Font Info: Redeclaring font encoding OMS on input line 743. \macc@depth=\count288 LaTeX Info: Redefining \bmod on input line 904. LaTeX Info: Redefining \pmod on input line 909. LaTeX Info: Redefining \smash on input line 939. LaTeX Info: Redefining \relbar on input line 969. LaTeX Info: Redefining \Relbar on input line 970. \c@MaxMatrixCols=\count289 \dotsspace@=\muskip17 \c@parentequation=\count290 \dspbrk@lvl=\count291 \tag@help=\toks18 \row@=\count292 \column@=\count293 \maxfields@=\count294 \andhelp@=\toks19 \eqnshift@=\dimen152 \alignsep@=\dimen153 \tagshift@=\dimen154 \tagwidth@=\dimen155 \totwidth@=\dimen156 \lineht@=\dimen157 \@envbody=\toks20 \multlinegap=\skip52 \multlinetaggap=\skip53 \mathdisplay@stack=\toks21 LaTeX Info: Redefining \[ on input line 2949. LaTeX Info: Redefining \] on input line 2950. ) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/amssymb.sty Package: amssymb 2013/01/14 v3.01 AMS font symbols (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/amsfonts.sty Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support \symAMSa=\mathgroup4 \symAMSb=\mathgroup5 LaTeX Font Info: Redeclaring math symbol \hbar on input line 98. LaTeX Font Info: Overwriting math alphabet \mathfrak' in version bold' (Font) U/euf/m/n --> U/euf/b/n on input line 106. )) (/usr/local/texlive/2025/texmf-dist/tex/latex/amscls/amsthm.sty Package: amsthm 2020/05/29 v2.20.6 \thm@style=\toks22 \thm@bodyfont=\toks23 \thm@headfont=\toks24 \thm@notefont=\toks25 \thm@headpunct=\toks26 \thm@preskip=\skip54 \thm@postskip=\skip55 \thm@headsep=\skip56 \dth@everypar=\toks27 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/mathtools/mathtools.sty Package: mathtools 2024/10/04 v1.31 mathematical typesetting tools (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/keyval.sty Package: keyval 2022/05/29 v1.15 key=value parser (DPC) \KV@toks@=\toks28 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/tools/calc.sty Package: calc 2025/03/01 v4.3b Infix arithmetic (KKT,FJ) \calc@Acount=\count295 \calc@Bcount=\count296 \calc@Adimen=\dimen158 \calc@Bdimen=\dimen159 \calc@Askip=\skip57 \calc@Bskip=\skip58 LaTeX Info: Redefining \setlength on input line 86. LaTeX Info: Redefining \addtolength on input line 87. \calc@Ccount=\count297 \calc@Cskip=\skip59 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/mathtools/mhsetup.sty Package: mhsetup 2021/03/18 v1.4 programming setup (MH) ) \g_MT_multlinerow_int=\count298 \l_MT_multwidth_dim=\dimen160 \origjot=\skip60 \l_MT_shortvdotswithinadjustabove_dim=\dimen161 \l_MT_shortvdotswithinadjustbelow_dim=\dimen162 \l_MT_above_intertext_sep=\dimen163 \l_MT_below_intertext_sep=\dimen164 \l_MT_above_shortintertext_sep=\dimen165 \l_MT_below_shortintertext_sep=\dimen166 \xmathstrut@box=\box55 \xmathstrut@dim=\dimen167 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/geometry/geometry.sty Package: geometry 2020/01/02 v5.9 Page Geometry (/usr/local/texlive/2025/texmf-dist/tex/generic/iftex/ifvtex.sty Package: ifvtex 2019/10/25 v1.7 ifvtex legacy package. Use iftex instead. (/usr/local/texlive/2025/texmf-dist/tex/generic/iftex/iftex.sty Package: iftex 2024/12/12 v1.0g TeX engine tests )) \Gm@cnth=\count299 \Gm@cntv=\count300 \c@Gm@tempcnt=\count301 \Gm@bindingoffset=\dimen168 \Gm@wd@mp=\dimen169 \Gm@odd@mp=\dimen170 \Gm@even@mp=\dimen171 \Gm@layoutwidth=\dimen172 \Gm@layoutheight=\dimen173 \Gm@layouthoffset=\dimen174 \Gm@layoutvoffset=\dimen175 \Gm@dimlist=\toks29 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/hyperref.sty Package: hyperref 2025-05-20 v7.01m Hypertext links for LaTeX (/usr/local/texlive/2025/texmf-dist/tex/latex/kvsetkeys/kvsetkeys.sty Package: kvsetkeys 2022-10-05 v1.19 Key value parser (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/kvdefinekeys/kvdefinekeys.sty Package: kvdefinekeys 2019-12-19 v1.6 Define keys (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/pdfescape/pdfescape.sty Package: pdfescape 2019/12/09 v1.15 Implements pdfTeX's escape features (HO) (/usr/local/texlive/2025/texmf-dist/tex/generic/ltxcmds/ltxcmds.sty Package: ltxcmds 2023-12-04 v1.26 LaTeX kernel commands for general use (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/pdftexcmds/pdftexcmds.sty Package: pdftexcmds 2020-06-27 v0.33 Utility functions of pdfTeX for LuaTeX (HO) (/usr/local/texlive/2025/texmf-dist/tex/generic/infwarerr/infwarerr.sty Package: infwarerr 2019/12/03 v1.5 Providing info/warning/error messages (HO) ) Package pdftexcmds Info: \pdf@primitive is available. Package pdftexcmds Info: \pdf@ifprimitive is available. Package pdftexcmds Info: \pdfdraftmode found. )) (/usr/local/texlive/2025/texmf-dist/tex/latex/hycolor/hycolor.sty Package: hycolor 2020-01-27 v1.10 Color options for hyperref/bookmark (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/nameref.sty Package: nameref 2023-11-26 v2.56 Cross-referencing by name of section (/usr/local/texlive/2025/texmf-dist/tex/latex/refcount/refcount.sty Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/gettitlestring/gettitlestring.sty Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO) (/usr/local/texlive/2025/texmf-dist/tex/latex/kvoptions/kvoptions.sty Package: kvoptions 2022-06-15 v3.15 Key value format for package options (HO) )) \c@section@level=\count302 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/etoolbox/etoolbox.sty Package: etoolbox 2025/02/11 v2.5l e-TeX tools for LaTeX (JAW) \etb@tempcnta=\count303 ) (/usr/local/texlive/2025/texmf-dist/tex/generic/stringenc/stringenc.sty Package: stringenc 2019/11/29 v1.12 Convert strings between diff. encodings (HO) ) \@linkdim=\dimen176 \Hy@linkcounter=\count304 \Hy@pagecounter=\count305 (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/pd1enc.def File: pd1enc.def 2025-05-20 v7.01m Hyperref: PDFDocEncoding definition (HO) Now handling font encoding PD1 ... ... no UTF-8 mapping file for font encoding PD1 ) (/usr/local/texlive/2025/texmf-dist/tex/generic/intcalc/intcalc.sty Package: intcalc 2019/12/15 v1.3 Expandable calculations with integers (HO) ) \Hy@SavedSpaceFactor=\count306 (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/puenc.def File: puenc.def 2025-05-20 v7.01m Hyperref: PDF Unicode definition (HO) Now handling font encoding PU ... ... no UTF-8 mapping file for font encoding PU ) Package hyperref Info: Hyper figures OFF on input line 4157. Package hyperref Info: Link nesting OFF on input line 4162. Package hyperref Info: Hyper index ON on input line 4165. Package hyperref Info: Plain pages OFF on input line 4172. Package hyperref Info: Backreferencing OFF on input line 4177. Package hyperref Info: Implicit mode ON; LaTeX internals redefined. Package hyperref Info: Bookmarks ON on input line 4424. \c@Hy@tempcnt=\count307 (/usr/local/texlive/2025/texmf-dist/tex/latex/url/url.sty \Urlmuskip=\muskip18 Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc. ) LaTeX Info: Redefining \url on input line 4763. \XeTeXLinkMargin=\dimen177 (/usr/local/texlive/2025/texmf-dist/tex/generic/bitset/bitset.sty Package: bitset 2019/12/09 v1.3 Handle bit-vector datatype (HO) (/usr/local/texlive/2025/texmf-dist/tex/generic/bigintcalc/bigintcalc.sty Package: bigintcalc 2019/12/15 v1.5 Expandable calculations on big integers (HO) )) \Fld@menulength=\count308 \Field@Width=\dimen178 \Fld@charsize=\dimen179 Package hyperref Info: Hyper figures OFF on input line 6042. Package hyperref Info: Link nesting OFF on input line 6047. Package hyperref Info: Hyper index ON on input line 6050. Package hyperref Info: backreferencing OFF on input line 6057. Package hyperref Info: Link coloring OFF on input line 6062. Package hyperref Info: Link coloring with OCG OFF on input line 6067. Package hyperref Info: PDF/A mode OFF on input line 6072. (/usr/local/texlive/2025/texmf-dist/tex/latex/base/atbegshi-ltx.sty Package: atbegshi-ltx 2021/01/10 v1.0c Emulation of the original atbegshi package with kernel methods ) \Hy@abspage=\count309 \c@Item=\count310 \c@Hfootnote=\count311 ) Package hyperref Info: Driver (autodetected): hpdftex. (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/hpdftex.def File: hpdftex.def 2025-05-20 v7.01m Hyperref driver for pdfTeX \Fld@listcount=\count312 \c@bookmark@seq@number=\count313 (/usr/local/texlive/2025/texmf-dist/tex/latex/rerunfilecheck/rerunfilecheck.sty Package: rerunfilecheck 2022-07-10 v1.10 Rerun checks for auxiliary files (HO) (/usr/local/texlive/2025/texmf-dist/tex/latex/base/atveryend-ltx.sty Package: atveryend-ltx 2020/08/19 v1.0a Emulation of the original atveryend package with kernel methods ) (/usr/local/texlive/2025/texmf-dist/tex/generic/uniquecounter/uniquecounter.sty Package: uniquecounter 2019/12/15 v1.4 Provide unlimited unique counter (HO) ) Package uniquecounter Info: New unique counter rerunfilecheck' on input line 285. ) \Hy@SectionHShift=\skip61 ) \c@theorem=\count314 \c@lemma=\count315 \c@corollary=\count316 \c@assumption=\count317 \c@remark=\count318 (/usr/local/texlive/2025/texmf-dist/tex/latex/l3backend/l3backend-pdftex.def File: l3backend-pdftex.def 2025-04-14 L3 backend support: PDF output (pdfTeX) \l__color_backend_stack_int=\count319 ) (./output.aux) \openout1 = output.aux'. LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/graphicx.sty Package: graphicx 2024/12/31 v1.2e Enhanced LaTeX Graphics (DPC,SPQR) (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/graphics.sty Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR) (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/trig.sty Package: trig 2023/12/02 v1.11 sin cos tan (DPC) ) (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics-cfg/graphics.cfg File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration ) Package graphics Info: Driver file: pdftex.def on input line 106. (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics-def/pdftex.def File: pdftex.def 2024/04/13 v1.2c Graphics/color driver for pdftex (/usr/local/texlive/2025/texmf-dist/tex/context/base/mkii/supp-pdf.mkii [Loading MPS to PDF converter (version 2006.09.02).] \scratchcounter=\count320 \scratchdimen=\dimen180 \scratchbox=\box56 \nofMPsegments=\count321 \nofMParguments=\count322 \everyMPshowfont=\toks30 \MPscratchCnt=\count323 \MPscratchDim=\dimen181 \MPnumerator=\count324 \makeMPintoPDFobject=\count325 \everyMPtoPDFconversion=\toks31 ))) (/usr/local/texlive/2025/texmf-dist/tex/latex/epstopdf-pkg/epstopdf-base.sty Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf Package epstopdf-base Info: Redefining graphics rule for .eps' on input line 485. (/usr/local/texlive/2025/texmf-dist/tex/latex/latexconfig/epstopdf-sys.cfg File: epstopdf-sys.cfg 2010/07/13 v1.3 Configuration of (r)epstopdf for TeX Live )) \Gin@req@height=\dimen182 \Gin@req@width=\dimen183 ) ''geometry'' driver: auto-detecting ''geometry'' detected driver: pdftex ''geometry'' verbose mode - [ preamble ] result: * driver: pdftex * paper: <default> * layout: <same size as paper> * layoutoffset:(h,v)=(0.0pt,0.0pt) * modes: * h-part:(L,W,R)=(72.26999pt, 469.75502pt, 72.26999pt) * v-part:(T,H,B)=(72.26999pt, 650.43001pt, 72.26999pt) * \paperwidth=614.295pt * \paperheight=794.96999pt * \textwidth=469.75502pt * \textheight=650.43001pt * \oddsidemargin=0.0pt * \evensidemargin=0.0pt * \topmargin=-37.0pt * \headheight=12.0pt * \headsep=25.0pt * \topskip=11.0pt * \footskip=30.0pt * \marginparwidth=59.0pt * \marginparsep=10.0pt * \columnsep=10.0pt * \skip\footins=10.0pt plus 4.0pt minus 2.0pt * \hoffset=0.0pt * \voffset=0.0pt * \mag=1000 * \@twocolumnfalse * \@twosidefalse * \@mparswitchfalse * \@reversemarginfalse * (1in=72.27pt=25.4mm, 1cm=28.453pt) Package hyperref Info: Link coloring OFF on input line 31. (./output.out) (./output.out) \@outlinefile=\write3 \openout3 = output.out'. LaTeX Font Info: Trying to load font information for U+msa on input line 33. (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/umsa.fd File: umsa.fd 2013/01/14 v3.01 AMS symbols A ) LaTeX Font Info: Trying to load font information for U+msb on input line 33. (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/umsb.fd File: umsb.fd 2013/01/14 v3.01 AMS symbols B ) [1 {/usr/local/texlive/2025/texmf-var/fonts/map/pdftex/updmap/pdftex.map}] Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 119. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 119. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \star' on input line 119. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 119. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 173. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \|' on input line 173. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 173. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 173. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \star' on input line 173. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \|' on input line 173. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 173. [2] [3] Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \sum@' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \slimits@' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \new@ifnextchar' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 257. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 319. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \ell' on input line 319. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 319. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 319. [4] [5] Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 449. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \varepsilon' on input line 449. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 449. [6] Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 498. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@ifnextchar' on input line 498. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \rho' on input line 498. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498. [7] [8] Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 643. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 643. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 643. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 643. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \hbox' on input line 658. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \mathsurround' on input line 658. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \z@' on input line 658. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \unskip' on input line 658. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@@italiccorr' on input line 658. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 658. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 658. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 658. [9] Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 685. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 685. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 685. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 734. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@ifnextchar' on input line 734. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \rho' on input line 734. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 734. [10] [11] ! LaTeX Error: Something's wrong--perhaps a missing \item. See the LaTeX manual or LaTeX Companion for explanation. Type H <return> for immediate help. ... l.863 \item F or $i\in C$ (nonzero coordinates), the subdifferential requires... Try typing <return> to proceed. If that doesn't work, type X <return> to quit. [12{/usr/local/texlive/2025/texmf-dist/fonts/enc/dvips/cm-super/cm-super-ts1.enc}] Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \ell' on input line 905. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 905. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@ifnextchar' on input line 905. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \rho' on input line 905. Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905. [13] (./output.aux) *********''' LaTeX2e <2025-06-01> patch level 1 L3 programming layer <2025-05-26> '''********* Package rerunfilecheck Info: File output.out' has not changed. (rerunfilecheck) Checksum: 126B49B4E6E9C0A9E988DE6F81B556FB;7041. ) Here is how much of TeX's memory you used: 11138 strings out of 467886 170530 string characters out of 5434056 594304 words of memory out of 5000000 39458 multiletter control sequences out of 15000+600000 640243 words of font info for 90 fonts, out of 8000000 for 9000 1141 hyphenation exceptions out of 8191 75i,12n,79p,544b,667s stack positions out of 10000i,1000n,20000p,200000b,200000s </usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx12.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmbxti10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmex10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi12.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi6.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi7.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi8.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr17.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr6.pfb></usr/local/te xlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr8.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy7.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy8.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmti10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/symbols/msbm10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/cm-super/sfrm1095.pfb> Output written on output.pdf (13 pages, 316958 bytes). PDF statistics: 422 PDF objects out of 1000 (max. 8388607) 361 compressed objects within 4 object streams 104 named destinations out of 1000 (max. 500000) 185 words of extra memory for PDF output out of 10000 (max. 10000000)" === Assistant: That \item error is coming from the \paragraph{...} commands inside your amsthm proof environment. === That \item error is coming from the <code>\paragraph{...}</code> commands inside your <code>amsthm</code> <code>proof</code> environment. amsthm’s proof is implemented as a trivlist (i.e., a list-like environment). Using sectioning commands like \paragraph{...} inside it can break the internal list state, so the next \item (from your itemize) is seen as “outside a list,” producing: : ===== Replace the \paragraph{Step ...} lines inside the proof with non-sectioning formatting like \medskip\noindent\textbf{...}. ===== Copy-paste this replacement <code>proof</code> block (it’s the same proof, just without \paragraph): <syntaxhighlight lang="latex">\begin{proof} \medskip\noindent\textbf{Step 1 (Construct a subgradient $z\in\partial g(\hat x)$).} Define $z\in\mathbb{R}^n$ coordinatewise by \[ z_i := \begin{cases} \lambda_i\,\mathrm{sign}(\hat x_i), & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$. Indeed, since $\supp(\hat x)=C$, we have $\hat x_i\neq 0$ for all $i\in C$ and $\hat x_i=0$ for all $i\in\bar C$. By Lemma~\ref{lem:subdiff_l1_weighted}: \begin{itemize} \item For $i\in C$ (nonzero coordinates), the subdifferential requires $z_i=\lambda_i\,\mathrm{sign}(\hat x_i)$, which holds by definition. \item For $i\in\bar C$ (zero coordinates), the subdifferential requires $z_i\in[-\lambda_i,\lambda_i]$. But by \eqref{eq:strict_dual_feas} and our definition, $|z_i|=|\nabla f(\hat x)_i|\le (1-\eta)\lambda_i<\lambda_i$, so indeed $z_i\in[-\lambda_i,\lambda_i]$. \end{itemize} Thus $z\in\partial g(\hat x)$. \medskip\noindent\textbf{Step 2 (Verify the KKT condition $\nabla f(\hat x)+z=0$).} We check this coordinatewise. \begin{itemize} \item For $i\in C$, $z_i=\lambda_i\,\mathrm{sign}(\hat x_i)$ and \eqref{eq:stationarity_C} gives $\nabla f(\hat x)_i+z_i=0$. \item For $i\in\bar C$, we defined $z_i=-\nabla f(\hat x)_i$, hence $\nabla f(\hat x)_i+z_i=0$ holds trivially. \end{itemize} Therefore $\nabla f(\hat x)+z=0$ with $z\in\partial g(\hat x)$. \medskip\noindent\textbf{Step 3 (Conclude $\hat x$ is a global minimizer).} By Lemma~\ref{lem:optimality_convex}, the existence of $z\in\partial g(\hat x)$ such that $\nabla f(\hat x)+z=0$ implies $0\in\partial F(\hat x)$ and hence $\hat x$ minimizes $F$. \medskip\noindent\textbf{Step 4 (Uniqueness and identification of the support).} By assumption, $F$ is $\mu$-strongly convex with $\mu>0$, hence the minimizer is unique. Since $\hat x$ is a minimizer, it must equal the unique minimizer: $\hat x=x^\star$. In particular, $\supp(x^\star)=\supp(\hat x)=C$ and thus $I^\star=\bar C$. \medskip\noindent\textbf{Step 5 (Lower bound the margin).} Since $I^\star=\bar C$, we can write \[ \gamma = \min_{i\in\bar C}\left(\lambda_i-|\nabla f(x^\star)_i|\right) = \min_{i\in\bar C}\left(\lambda_i-|\nabla f(\hat x)_i|\right), \] where the equality uses $x^\star=\hat x$. Using \eqref{eq:strict_dual_feas}, for each $i\in\bar C$ we have \[ \lambda_i-|\nabla f(\hat x)_i| \ge \lambda_i-(1-\eta)\lambda_i = \eta\lambda_i. \] Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_lower_bound_cert}. \end{proof} </syntaxhighlight> ===== Those “Token not allowed in a PDF string” warnings come from math in section titles. They’re harmless, but if you want to remove them, use \texorpdfstring in titles, e.g. ===== <syntaxhighlight lang="latex">\section{A primal--dual cluster certificate implying a \texorpdfstring{$\sqrt{\rho}$}{sqrt(rho)}-scale margin} </syntaxhighlight> That will keep the math in the printed title while giving hyperref a plain-text bookmark. === User: I don't like your proof. === I don't like your proof. I want a proof that parameterizes the margin γ as a function of conductance, internal connectivity and outside degree. Then if all these parameters are well-behaved, it is guaranteed that γ won't be small, i.e., of the order of sqrt(ρ). === Assistant: Append the following replacement section it gives an explicit parameterized lower bound on the margin γ\gammaγ in terms… === Append the following replacement section (it gives an explicit parameterized lower bound on the margin γ\gammaγ in terms of conductance, internal connectivity, and outside degree, and then a corollary showing when this becomes Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ)). open-problem-fountoulakis22a <syntaxhighlight lang="latex">% ============================================================ \section{A conductance--connectivity--degree lower bound on the margin} \label{sec:gamma_conductance_connectivity_degree} We specialize to the $\ell_1$-regularized PageRank objective~(3): \begin{equation}\label{eq:ppr_obj_gamma_param} \min_{x\in\mathbb{R}^n} F(x) := \underbrace{\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s}_{=:f(x)} \;+\; \underbrace{\rho\alpha\|D^{1/2}x\|_1}_{=:g(x)}. \end{equation} Here $G=(V,E)$ is undirected and unweighted, $A$ is the adjacency matrix, $D$ is the diagonal degree matrix, $s=e_v$ is a seed at node $v$, and \begin{equation}\label{eq:Q_def_gamma_param} Q=\alpha I+\frac{1-\alpha}{2}L = \frac{1+\alpha}{2}I-\frac{1-\alpha}{2}\,D^{-1/2}AD^{-1/2}. \end{equation} The weighted $\ell_1$ penalty has coordinate weights \begin{equation}\label{eq:lambda_def_gamma_param} g(x)=\sum_{i=1}^n \lambda_i |x_i|, \qquad \lambda_i:=\rho\alpha\sqrt{d_i}. \end{equation} \paragraph{Cluster notation.} Fix a set (candidate cluster) $C\subseteq V$ with $v\in C$, and let $\bar C:=V\setminus C$. Define the volume and cut: \[ \mathrm{vol}(C):=\sum_{i\in C} d_i, \qquad \mathrm{cut}(C,\bar C):=|\{(i,j)\in E:\; i\in C,\ j\in\bar C\}|. \] We use the (one-sided) conductance \[ \phi(C):=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)}. \] Let \[ d_{\mathrm{in}}:=\min_{j\in C} d_j, \qquad d_{\mathrm{out}}:=\min_{i\in\bar C} d_i, \qquad d_v:=d_{v}. \] \paragraph{Internal connectivity parameter.} Let \begin{equation}\label{eq:muC_def_gamma_param} \mu_C:=\lambda_{\min}(Q_{CC}), \end{equation} the smallest eigenvalue of the principal submatrix $Q_{CC}$. Since $Q\succeq \alpha I$, we always have $\mu_C\ge \alpha$, and larger $\mu_C$ corresponds to stronger internal connectivity/conditioning on the cluster. \subsection{A pointwise bound on the outside gradient in terms of $\phi(C)$ and $\mu_C$} We will construct a candidate vector supported on $C$ by solving the KKT stationarity equations \emph{restricted to $C$} under the natural sign choice $x_C>0$ (appropriate for PageRank-type solutions). Define the restricted candidate $\hat x$ by \begin{equation}\label{eq:xhat_def_gamma_param} \hat x_{\bar C}:=0, \qquad \hat x_C:=Q_{CC}^{-1}\bigl(b_C-\lambda_C\bigr), \qquad b:=\alpha D^{-1/2}s. \end{equation} Since $s=e_v$ and $v\in C$, the vector $b_C$ has exactly one nonzero entry: \begin{equation}\label{eq:bC_norm_gamma_param} \|b_C\|_2=\frac{\alpha}{\sqrt{d_v}}. \end{equation} \begin{lemma}[Stationarity on $C$]\label{lem:restricted_stationarity_gamma_param} The vector $\hat x$ satisfies the exact stationarity equation on $C$: \begin{equation}\label{eq:restricted_stationarity_gamma_param} \nabla f(\hat x)_C+\lambda_C = 0. \end{equation} \end{lemma} \begin{proof} Since $\nabla f(x)=Qx-b$, and $\hat x_{\bar C}=0$, we have \[ \nabla f(\hat x)_C = (Q_{CC}\hat x_C - b_C). \] By the definition \eqref{eq:xhat_def_gamma_param}, $Q_{CC}\hat x_C=b_C-\lambda_C$, hence $\nabla f(\hat x)_C = -\lambda_C$, which is \eqref{eq:restricted_stationarity_gamma_param}. \end{proof} Next we bound the gradient on $\bar C$ in terms of conductance, internal connectivity, and outside degree. \begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xC_norm_gamma_param} Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then \begin{equation}\label{eq:xC_norm_gamma_param} \|\hat x_C\|_2 \;\le\; \frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation} \end{lemma} \begin{proof} Since $Q_{CC}$ is symmetric positive definite and $\lambda_{\min}(Q_{CC})=\mu_C$, \[ \|Q_{CC}^{-1}\|_2 = \frac{1}{\mu_C}. \] Using \eqref{eq:xhat_def_gamma_param} and the triangle inequality, \[ \|\hat x_C\|_2 = \left\|Q_{CC}^{-1}(b_C-\lambda_C)\right\|_2 \le \|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|\lambda_C\|_2\bigr). \] By \eqref{eq:bC_norm_gamma_param}, $\|b_C\|_2=\alpha/\sqrt{d_v}$. Also, by \eqref{eq:lambda_def_gamma_param}, \[ \|\lambda_C\|_2^2=\sum_{i\in C} (\rho\alpha\sqrt{d_i})^2 =(\rho\alpha)^2\sum_{i\in C} d_i =(\rho\alpha)^2\,\mathrm{vol}(C), \] so $\|\lambda_C\|_2=\rho\alpha\sqrt{\mathrm{vol}(C)}$. Substitute these bounds to obtain \eqref{eq:xC_norm_gamma_param}. \end{proof} \begin{lemma}[Outside gradient bound from conductance and degrees]\label{lem:outside_grad_bound_gamma_param} Assume $v\in C$ so that $b_{\bar C}=0$. Then for each $i\in\bar C$, \begin{equation}\label{eq:outside_grad_pointwise_gamma_param} |\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\mathrm{cut}(C,\bar C)}{d_i\,d_{\mathrm{in}}}}. \end{equation} Consequently, using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$, \begin{equation}\label{eq:outside_grad_uniform_gamma_param} \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}. \end{equation} \end{lemma} \begin{proof} Fix $i\in\bar C$. Since $\hat x_{\bar C}=0$ and $b_{\bar C}=0$, \[ \nabla f(\hat x)_i = (Q\hat x - b)_i = (Q_{\bar C C}\hat x_C)_i = \sum_{j\in C}Q_{ij}\hat x_j. \] From \eqref{eq:Q_def_gamma_param}, for $i\neq j$ we have \[ Q_{ij} = -\frac{1-\alpha}{2}(D^{-1/2}AD^{-1/2})_{ij} = \begin{cases} -\dfrac{1-\alpha}{2}\dfrac{1}{\sqrt{d_i d_j}}, & (i,j)\in E,\\[2mm] 0, & (i,j)\notin E. \end{cases} \] Thus \[ |\nabla f(\hat x)_i| = \left|\sum_{j\in C\cap N(i)} -\frac{1-\alpha}{2}\frac{\hat x_j}{\sqrt{d_i d_j}}\right| \le \frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i}} \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}. \] Apply Cauchy--Schwarz to the sum: \[ \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}} \le \left(\sum_{j\in C\cap N(i)} \hat x_j^2\right)^{1/2} \left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2} \le \|\hat x_C\|_2\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}. \] Since $d_j\ge d_{\mathrm{in}}$ for all $j\in C$, we have \[ \sum_{j\in C\cap N(i)} \frac{1}{d_j} \le \frac{|C\cap N(i)|}{d_{\mathrm{in}}}. \] Finally, $|C\cap N(i)|$ is the number of cut edges incident to $i$, hence $|C\cap N(i)|\le \mathrm{cut}(C,\bar C)$, which gives \eqref{eq:outside_grad_pointwise_gamma_param}. The uniform bound \eqref{eq:outside_grad_uniform_gamma_param} follows by using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$. \end{proof} \subsection{A parameterized margin lower bound and a $\sqrt{\rho}$ corollary} Define the cluster-dependent quantity \begin{equation}\label{eq:DeltaC_def_gamma_param} \Delta(C) := \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation} \begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree] \label{thm:gamma_parameterized_by_cluster} Assume $v\in C$ and that the restricted candidate \eqref{eq:xhat_def_gamma_param} satisfies \begin{equation}\label{eq:xhat_positive_gamma_param} \hat x_C>0\quad\text{(componentwise)}. \end{equation} Then: \begin{enumerate} \item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique} global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$. \item[(ii)] Whenever $x^\star=\hat x$, the strict-complementarity margin satisfies the explicit lower bound \begin{equation}\label{eq:gamma_param_bound_gamma_param} \gamma = \min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr) \;\ge\; \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C). \end{equation} Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then \begin{equation}\label{eq:gamma_eta_bound_gamma_param} \gamma \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}. \end{equation} \end{enumerate} \end{theorem} \begin{proof} We prove (i) and (ii) in order. \medskip\noindent\textbf{Step 1: KKT conditions for \eqref{eq:ppr_obj_gamma_param}.} Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists $z\in\partial g(x)$ such that \begin{equation}\label{eq:KKT_gamma_param} \nabla f(x)+z=0. \end{equation} Moreover, $F$ is strongly convex because $f$ is $\alpha$-strongly convex and $g$ is convex, so the minimizer is unique. \medskip\noindent\textbf{Step 2: Build a valid $z\in\partial g(\hat x)$ and verify \eqref{eq:KKT_gamma_param}.} Because of \eqref{eq:xhat_positive_gamma_param}, $\mathrm{sign}(\hat x_i)=+1$ for all $i\in C$. Define $z\in\mathbb{R}^n$ by \[ z_i := \begin{cases} \lambda_i, & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$. Indeed: \begin{itemize} \item If $i\in C$, then $\hat x_i>0$ and the weighted-$\ell_1$ subgradient requires $z_i=\lambda_i$, which holds. \item If $i\in\bar C$, then $\hat x_i=0$ and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$. This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$. \end{itemize} Next, Lemma~\ref{lem:restricted_stationarity_gamma_param} gives $\nabla f(\hat x)_C=-\lambda_C$, so $\nabla f(\hat x)_C+z_C=0$. By definition, $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$. Thus \eqref{eq:KKT_gamma_param} holds. \medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.} By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param}, \[ \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \Delta(C). \] Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$, then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$, so $z\in\partial g(\hat x)$ and the KKT condition holds. Hence $\hat x$ is a minimizer, and by strong convexity it is the unique minimizer: $\hat x=x^\star$. This proves (i). \medskip\noindent\textbf{Step 4: Lower bound the margin.} Assume $x^\star=\hat x$. Since $x^\star_{\bar C}=0$, the inactive set contains $\bar C$ and \[ \gamma = \min_{i\in \bar C}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) \ge \min_{i\in\bar C}\lambda_i - \max_{i\in\bar C}|\nabla f(\hat x)_i| = \rho\alpha\sqrt{d_{\mathrm{out}}} - \max_{i\in\bar C}|\nabla f(\hat x)_i|. \] Using $\max_{i\in\bar C}|\nabla f(\hat x)_i|\le \Delta(C)$ gives \eqref{eq:gamma_param_bound_gamma_param}. The $\eta$-form \eqref{eq:gamma_eta_bound_gamma_param} is immediate. \end{proof} \begin{corollary}[A concrete $\gamma=\Omega(\sqrt{\rho})$ regime]\label{cor:gamma_Omega_sqrt_rho_regime} Assume the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster}. Suppose, in addition, that there are constants $V_0,d_0,\mu_0,c_0>0$ such that \[ \mathrm{vol}(C)\le \frac{V_0}{\rho},\qquad d_{\mathrm{in}}\ge d_0,\qquad d_v\ge d_0,\qquad \mu_C\ge \mu_0,\qquad d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho}. \] If furthermore the conductance satisfies \begin{equation}\label{eq:phi_rho_condition_gamma_param} \phi(C)\le c_1\rho \quad\text{for a sufficiently small constant $c_1$ (depending only on $V_0,d_0,\mu_0,c_0,\alpha$),} \end{equation} then there exists a constant $\eta\in(0,1)$ (depending only on the same parameters) such that \[ \gamma \ge \eta\,\sqrt{\rho}. \] \end{corollary} \begin{proof} From $d_{\mathrm{out}}\ge c_0/(\alpha^2\rho)$ we get \[ \rho\alpha\sqrt{d_{\mathrm{out}}}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}. \] Next, using the assumed bounds in \eqref{eq:DeltaC_def_gamma_param}, \[ \Delta(C) \le \frac{1-\alpha}{2\mu_0}\, \sqrt{\frac{\phi(C)\cdot (V_0/\rho)}{(c_0/(\alpha^2\rho))\cdot d_0}}\, \left(\frac{\alpha}{\sqrt{d_0}}+\rho\alpha\sqrt{V_0/\rho}\right) = \frac{1-\alpha}{2\mu_0}\, \alpha\,\sqrt{\frac{V_0}{c_0 d_0}}\, \sqrt{\phi(C)}\, \left(\frac{\alpha}{\sqrt{d_0}}+\alpha\sqrt{V_0}\sqrt{\rho}\right). \] Under \eqref{eq:phi_rho_condition_gamma_param}, $\sqrt{\phi(C)}\le \sqrt{c_1}\sqrt{\rho}$, hence $\Delta(C)\le K\sqrt{\rho}$ for a constant $K$ depending only on $(V_0,d_0,\mu_0,c_0,\alpha)$ and $c_1$. Choosing $c_1$ sufficiently small ensures $\Delta(C)\le (1-\eta)\sqrt{c_0}\sqrt{\rho}$ for some $\eta\in(0,1)$. Applying \eqref{eq:gamma_param_bound_gamma_param} then yields $\gamma\ge \eta\sqrt{c_0}\sqrt{\rho}$. Absorbing $\sqrt{c_0}$ into $\eta$ gives the stated $\gamma\ge \eta\sqrt{\rho}$. \end{proof} \begin{remark}[Interpretation] The quantity $\Delta(C)$ in \eqref{eq:DeltaC_def_gamma_param} is explicit and shows how the margin depends on: (i) the \emph{outside degree} $d_{\mathrm{out}}$ through the threshold $\lambda_i=\rho\alpha\sqrt{d_i}$, (ii) the \emph{boundary size} through $\phi(C)$, and (iii) the \emph{internal conditioning} $\mu_C=\lambda_{\min}(Q_{CC})$. When $\phi(C)=O(\rho)$, $d_{\mathrm{out}}=\Theta(1/(\alpha^2\rho))$, and $\mu_C$ is bounded below, the bound forces $\gamma=\Omega(\sqrt{\rho})$. \end{remark} % ============================================================ </syntaxhighlight> === User: Not bad, but there is something missing. === Not bad, but there is something missing. How do I know that "\hat x_C>0\quad\text{(componentwise)}." This needs a proof. Does your choice of ρ > φ(C), and other parameter choices that you made, guarantee that ""\hat x_C>0\quad\text{(componentwise)}."" ? If not, can you modify the proof? === Assistant: You’re absolutely right to flag this. === You’re absolutely right to flag this. In the parameterized margin section I gave, the line x^C>0(componentwise)\hat x_C>0 \quad\text{(componentwise)}x^C>0(componentwise) is not implied by conditions like ρ≳ϕ(C)\rho \gtrsim \phi(C)ρ≳ϕ(C), internal connectivity, or outside degree. In fact, it can fail even in the “best” clusters: * Take CCC to be a clique (conductance ϕ(C)=0\phi(C)=0ϕ(C)=0, extremely well connected internally). * If ρ\rhoρ is large enough, the ℓ1\ell_1ℓ1 penalty can still shrink most coordinates in CCC to zero (sometimes leaving only the seed active). So: no, the “ρ>ϕ(C)\rho > \phi(C)ρ>ϕ(C)”–type assumptions are aimed at controlling leakage / enforcing outside dual feasibility; they do not guarantee that every node in CCC stays active. ===== You can (and should) modify the proof so that it does not require x^C>0\hat x_C>0x^C>0. The clean fix is: ===== # Define x^\hat xx^ as the restricted minimizer on CCC (with xCˉ=0x_{\bar C}=0xCˉ=0), not by forcing the sign pattern. # Prove x^C≥0\hat x_C\ge 0x^C≥0 automatically (nonnegativity is true for PageRank-type QQQ with nonpositive off-diagonals). # Use the restricted KKT subgradient zC∈∂gC(x^C)z_C\in\partial g_C(\hat x_C)zC∈∂gC(x^C) instead of hard-coding zC=λCz_C=\lambda_CzC=λC. # Then your conductance/connectivity/degree analysis bounds the outside slack and yields a parameterized lower bound on the outside margin γCˉ:=mini∈Cˉ(λi−∣∇f(x⋆)i∣).\gamma_{\bar C}:=\min_{i\in\bar C}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr).γCˉ:=i∈Cˉmin(λi−∣∇f(x⋆)i∣). If you additionally want the global γ\gammaγ (min over all inactive, including possible zeros inside CCC), you must also control any zeros inside CCC; that is a separate “beta-min / interior slack” assumption and cannot be forced just from ϕ(C)\phi(C)ϕ(C) etc. Below is drop-in LaTeX that fixes the proof properly. ==== Paste the following into your document, and then replace the definition \eqref{eq:xhat_def_gamma_param} and Theorem~\ref{thm:gamma_parameterized_by_cluster} (and its proof) with the revised versions below. ==== <syntaxhighlight lang="latex">% ------------------------------------------------------------ % Fix: define \hat x as the restricted minimizer (no sign guess), % prove \hat x_C \ge 0, and use its KKT multiplier z_C. % ------------------------------------------------------------ \subsubsection*{Restricted minimizer on a candidate cluster} Define the restricted optimization problem on $C$ by fixing $x_{\bar C}=0$: \begin{equation}\label{eq:restricted_problem_C} \min_{x\in\mathbb{R}^n:\; x_{\bar C}=0}\; F(x) = \frac12 x^\top Qx-\alpha x^\top D^{-1/2}s \;+\; \sum_{i\in C}\lambda_i |x_i|, \qquad \lambda_i=\rho\alpha\sqrt{d_i}. \end{equation} \begin{definition}[Restricted minimizer]\label{def:xhat_restricted} Let $\hat x$ denote the unique minimizer of \eqref{eq:restricted_problem_C}. Equivalently, $\hat x_{\bar C}=0$ and $\hat x_C$ is the unique minimizer of \[ \min_{x_C\in\mathbb{R}^{|C|}}\; \frac12 x_C^\top Q_{CC}x_C - b_C^\top x_C + \sum_{i\in C}\lambda_i|x_i|, \qquad b:=\alpha D^{-1/2}s. \] \end{definition} \begin{lemma}[Absolute-value improvement and nonnegativity]\label{lem:abs_improvement_nonneg} Assume $v\in C$ so that $b_C\ge 0$ and $b_{\bar C}=0$. Then: \begin{enumerate} \item For every $x$ with $x_{\bar C}=0$, the vector $|x|$ (componentwise absolute value) satisfies \[ F(|x|)\le F(x). \] \item Consequently, the restricted minimizer satisfies $\hat x_C\ge 0$ componentwise. \end{enumerate} \end{lemma} \begin{proof} Fix any $x$ with $x_{\bar C}=0$ and set $y:=|x|$ (so $y_{\bar C}=0$). We compare each term in $F$. \medskip\noindent\textbf{(i) Quadratic term.} Since $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$, we have $Q_{ij}\le 0$ for $i\neq j$. Write \[ x^\top Qx = \sum_i Q_{ii}x_i^2 + 2\sum_{i<j}Q_{ij}x_i x_j. \] For each $i<j$, we have $x_i x_j \le |x_i||x_j|=y_i y_j$, and since $Q_{ij}\le 0$ this implies $Q_{ij}x_i x_j \ge Q_{ij}y_i y_j$. Therefore $x^\top Qx \ge y^\top Qy$. \medskip\noindent\textbf{(ii) Linear term.} Since $b\ge 0$ and $y_i=|x_i|\ge x_i$ for all $i$, we have $-b^\top y \le -b^\top x$. \medskip\noindent\textbf{(iii) $\ell_1$ term.} By definition, $\sum_i \lambda_i |y_i|=\sum_i\lambda_i|x_i|$. \medskip\noindent Combining (i)--(iii) yields $F(y)\le F(x)$, proving the first claim. For the second claim, let $\hat x$ be the unique restricted minimizer. Then $F(|\hat x|)\le F(\hat x)$, so $|\hat x|$ is also a minimizer. By uniqueness, $|\hat x|=\hat x$, hence $\hat x_C\ge 0$. \end{proof} \begin{lemma}[Restricted KKT on $C$]\label{lem:restricted_KKT_on_C} There exists a vector $z_C\in\mathbb{R}^{|C|}$ such that \begin{equation}\label{eq:restricted_KKT} Q_{CC}\hat x_C - b_C + z_C = 0, \end{equation} and for every $i\in C$, \begin{equation}\label{eq:zC_bound} z_{C,i}\in \lambda_i\,\partial |\,\hat x_i\,| \subseteq [-\lambda_i,\lambda_i]. \end{equation} In particular, $|z_{C,i}|\le \lambda_i$ for all $i\in C$, and thus $\|z_C\|_2 \le \|\lambda_C\|_2 = \rho\alpha\sqrt{\mathrm{vol}(C)}$. \end{lemma} \begin{proof} The restricted objective in \eqref{eq:restricted_problem_C} is strongly convex in $x_C$ because $Q_{CC}\succ 0$ and the $\ell_1$ term is convex, hence it has a unique minimizer $\hat x_C$. First-order optimality for convex composite problems gives \[ 0 \in Q_{CC}\hat x_C - b_C + \partial\!\left(\sum_{i\in C}\lambda_i|\hat x_i|\right), \] which is exactly the existence of $z_C$ satisfying \eqref{eq:restricted_KKT}--\eqref{eq:zC_bound}. The norm bound follows from $|z_{C,i}|\le\lambda_i$ and the definition of $\|\lambda_C\|_2$. \end{proof} \begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xhatC_norm_gamma_param_fixed} Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then \[ \|\hat x_C\|_2 \le \frac{1}{\mu_C}\left(\|b_C\|_2+\|z_C\|_2\right) \le \frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \] \end{lemma} \begin{proof} From \eqref{eq:restricted_KKT}, $Q_{CC}\hat x_C=b_C-z_C$, so \[ \|\hat x_C\|_2 \le \|Q_{CC}^{-1}\|_2\,\|b_C-z_C\|_2 \le \frac{1}{\mu_C}\left(\|b_C\|_2+\|z_C\|_2\right). \] Since $s=e_v$ and $v\in C$, we have $\|b_C\|_2=\alpha/\sqrt{d_v}$, and Lemma~\ref{lem:restricted_KKT_on_C} gives $\|z_C\|_2\le \rho\alpha\sqrt{\mathrm{vol}(C)}$. \end{proof} </syntaxhighlight> Now here is the revised theorem that replaces the positivity-dependent one. Note how it (correctly) lower bounds an outside margin γCˉ\gamma_{\bar C}γCˉ. If you still want the global γ\gammaγ, you’ll need an extra interior assumption (I add that as an “optional add-on” at the end). <syntaxhighlight lang="latex">\begin{theorem}[Parameterized outside-margin bound without assuming $\hat x_C>0$] \label{thm:gamma_parameterized_by_cluster_fixed} Assume $v\in C$ and let $\hat x$ be the restricted minimizer from Definition~\ref{def:xhat_restricted}. Define \[ \gamma_{\bar C}:=\min_{i\in\bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr), \qquad \lambda_i=\rho\alpha\sqrt{d_i}. \] Let $\Delta(C)$ be defined exactly as in \eqref{eq:DeltaC_def_gamma_param}, but with $\|\hat x_C\|_2$ bounded using Lemma~\ref{lem:xhatC_norm_gamma_param_fixed}; i.e. \[ \Delta(C) := \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \] Then: \begin{enumerate} \item[(i)] If $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then $x^\star_{\bar C}=0$ (the global minimizer is supported inside $C$). \item[(ii)] Under the same condition, the \emph{outside margin} satisfies \[ \gamma_{\bar C}\ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}. \] \end{enumerate} \end{theorem} \begin{proof} \medskip\noindent\textbf{Step 1 (Outside-gradient bound).} Since $v\in C$, we have $b_{\bar C}=0$. Because $\hat x_{\bar C}=0$, \[ \nabla f(\hat x)_{\bar C} = (Q\hat x-b)_{\bar C} = (Q_{\bar C C}\hat x_C)_{\bar C}. \] Lemma~\ref{lem:outside_grad_bound_gamma_param} (unchanged) bounds $\max_{i\in\bar C}|\nabla f(\hat x)_i|\le \Delta(C)$, using Lemma~\ref{lem:xhatC_norm_gamma_param_fixed} to control $\|\hat x_C\|_2$. \medskip\noindent\textbf{Step 2 (Build a global KKT certificate).} From Lemma~\ref{lem:restricted_KKT_on_C} there exists $z_C\in\partial g_C(\hat x_C)$ such that $Q_{CC}\hat x_C-b_C+z_C=0$. Define a full vector $z\in\mathbb{R}^n$ by \[ z_i := \begin{cases} (z_C)_i, & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$. Indeed: \begin{itemize} \item On $C$, $z_C\in\partial g_C(\hat x_C)$ by construction. \item On $\bar C$, $\hat x_i=0$, and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$. This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$. \end{itemize} Under the hypothesis $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ we have $|\nabla f(\hat x)_i|\le \Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$ (since $\lambda_i\ge \rho\alpha\sqrt{d_{\mathrm{out}}}$). Thus $z\in\partial g(\hat x)$. Finally, note that for $i\in C$, \[ (\nabla f(\hat x)+z)_i = (Q_{CC}\hat x_C-b_C+z_C)_i = 0, \] and for $i\in\bar C$ we have $(\nabla f(\hat x)+z)_i= \nabla f(\hat x)_i-\nabla f(\hat x)_i=0$. Hence $\nabla f(\hat x)+z=0$ with $z\in\partial g(\hat x)$, i.e. $0\in\partial F(\hat x)$. Therefore $\hat x$ is a global minimizer. By strong convexity, it is the unique global minimizer: $\hat x=x^\star$. In particular $x^\star_{\bar C}=0$, proving (i). \medskip\noindent\textbf{Step 3 (Outside margin).} Since $x^\star=\hat x$ and $\bar C\subseteq I^\star$, for every $i\in\bar C$, \[ \lambda_i-|\nabla f(x^\star)_i| =\lambda_i-|\nabla f(\hat x)_i| \ge \lambda_i-\Delta(C) \ge \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C) \ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}}. \] Taking the minimum over $i\in\bar C$ yields $\gamma_{\bar C}\ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}}$, proving (ii). \end{proof} \begin{remark}[Why this does not automatically give the \emph{global} margin $\gamma$] The global margin is $\gamma=\min_{i\in I^\star}(\lambda_i-|\nabla f(x^\star)_i|)$, where $I^\star$ includes \emph{all} zero coordinates of $x^\star$, including any zeros inside $C$. The theorem above lower bounds the margin only on $\bar C$. To conclude the same lower bound for the global $\gamma$, one needs an additional assumption that $x^\star_i>0$ for all $i\in C$ (equivalently, $I^\star=\bar C$), or else a separate lower bound on the slack for the (possibly) inactive indices inside $C$. Such an ``interior beta-min / interior slack'' condition cannot be forced solely from $\phi(C)$, $\mu_C$, and $d_{\mathrm{out}}$: even an internally expanding cluster with $\phi(C)=0$ can have many coordinates truncated to $0$ if $\rho$ is chosen large. \end{remark} </syntaxhighlight> ===== - No, ρ\rhoρ vs. ϕ(C)\phi(C)ϕ(C) does not guarantee x^C>0\hat x_C>0x^C>0. ===== * The right fix is to: - define x^\hat xx^ as the restricted minimizer, - prove x^C≥0\hat x_C\ge 0x^C≥0 (done above), - and then your conductance/connectivity/degree bound gives a clean Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ) outside margin. If you tell me whether you truly need the global γ\gammaγ (including possible zeros inside CCC) or whether an outside margin γCˉ\gamma_{\bar C}γCˉ is sufficient for your downstream argument (runtime/locality), I can tighten the statement accordingly. === User: I don't understand, can you tell me how exactly to update this piece of code? === I don't understand, can you tell me how exactly to update this piece of code? "\section{A conductance--connectivity--degree lower bound on the margin} \label{sec:gamma_conductance_connectivity_degree} We specialize to the $\ell_1$-regularized PageRank objective~(3): \begin{equation}\label{eq:ppr_obj_gamma_param} \min_{x\in\mathbb{R}^n} F(x) := \underbrace{\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s}_{=:f(x)} \;+\; \underbrace{\rho\alpha\|D^{1/2}x\|_1}_{=:g(x)}. \end{equation} Here $G=(V,E)$ is undirected and unweighted, $A$ is the adjacency matrix, $D$ is the diagonal degree matrix, $s=e_v$ is a seed at node $v$, and \begin{equation}\label{eq:Q_def_gamma_param} Q=\alpha I+\frac{1-\alpha}{2}L = \frac{1+\alpha}{2}I-\frac{1-\alpha}{2}\,D^{-1/2}AD^{-1/2}. \end{equation} The weighted $\ell_1$ penalty has coordinate weights \begin{equation}\label{eq:lambda_def_gamma_param} g(x)=\sum_{i=1}^n \lambda_i |x_i|, \qquad \lambda_i:=\rho\alpha\sqrt{d_i}. \end{equation} \paragraph{Cluster notation.} Fix a set (candidate cluster) $C\subseteq V$ with $v\in C$, and let $\bar C:=V\setminus C$. Define the volume and cut: \[ \mathrm{vol}(C):=\sum_{i\in C} d_i, \qquad \mathrm{cut}(C,\bar C):=|\{(i,j)\in E:\; i\in C,\ j\in\bar C\}|. \] We use the (one-sided) conductance \[ \phi(C):=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)}. \] Let \[ d_{\mathrm{in}}:=\min_{j\in C} d_j, \qquad d_{\mathrm{out}}:=\min_{i\in\bar C} d_i, \qquad d_v:=d_{v}. \] \paragraph{Internal connectivity parameter.} Let \begin{equation}\label{eq:muC_def_gamma_param} \mu_C:=\lambda_{\min}(Q_{CC}), \end{equation} the smallest eigenvalue of the principal submatrix $Q_{CC}$. Since $Q\succeq \alpha I$, we always have $\mu_C\ge \alpha$, and larger $\mu_C$ corresponds to stronger internal connectivity/conditioning on the cluster. \subsection{A pointwise bound on the outside gradient in terms of $\phi(C)$ and $\mu_C$} We will construct a candidate vector supported on $C$ by solving the KKT stationarity equations \emph{restricted to $C$} under the natural sign choice $x_C>0$ (appropriate for PageRank-type solutions). Define the restricted candidate $\hat x$ by \begin{equation}\label{eq:xhat_def_gamma_param} \hat x_{\bar C}:=0, \qquad \hat x_C:=Q_{CC}^{-1}\bigl(b_C-\lambda_C\bigr), \qquad b:=\alpha D^{-1/2}s. \end{equation} Since $s=e_v$ and $v\in C$, the vector $b_C$ has exactly one nonzero entry: \begin{equation}\label{eq:bC_norm_gamma_param} \|b_C\|_2=\frac{\alpha}{\sqrt{d_v}}. \end{equation} \begin{lemma}[Stationarity on $C$]\label{lem:restricted_stationarity_gamma_param} The vector $\hat x$ satisfies the exact stationarity equation on $C$: \begin{equation}\label{eq:restricted_stationarity_gamma_param} \nabla f(\hat x)_C+\lambda_C = 0. \end{equation} \end{lemma} \begin{proof} Since $\nabla f(x)=Qx-b$, and $\hat x_{\bar C}=0$, we have \[ \nabla f(\hat x)_C = (Q_{CC}\hat x_C - b_C). \] By the definition \eqref{eq:xhat_def_gamma_param}, $Q_{CC}\hat x_C=b_C-\lambda_C$, hence $\nabla f(\hat x)_C = -\lambda_C$, which is \eqref{eq:restricted_stationarity_gamma_param}. \end{proof} Next we bound the gradient on $\bar C$ in terms of conductance, internal connectivity, and outside degree. \begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xC_norm_gamma_param} Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then \begin{equation}\label{eq:xC_norm_gamma_param} \|\hat x_C\|_2 \;\le\; \frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation} \end{lemma} \begin{proof} Since $Q_{CC}$ is symmetric positive definite and $\lambda_{\min}(Q_{CC})=\mu_C$, \[ \|Q_{CC}^{-1}\|_2 = \frac{1}{\mu_C}. \] Using \eqref{eq:xhat_def_gamma_param} and the triangle inequality, \[ \|\hat x_C\|_2 = \left\|Q_{CC}^{-1}(b_C-\lambda_C)\right\|_2 \le \|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|\lambda_C\|_2\bigr). \] By \eqref{eq:bC_norm_gamma_param}, $\|b_C\|_2=\alpha/\sqrt{d_v}$. Also, by \eqref{eq:lambda_def_gamma_param}, \[ \|\lambda_C\|_2^2=\sum_{i\in C} (\rho\alpha\sqrt{d_i})^2 =(\rho\alpha)^2\sum_{i\in C} d_i =(\rho\alpha)^2\,\mathrm{vol}(C), \] so $\|\lambda_C\|_2=\rho\alpha\sqrt{\mathrm{vol}(C)}$. Substitute these bounds to obtain \eqref{eq:xC_norm_gamma_param}. \end{proof} \begin{lemma}[Outside gradient bound from conductance and degrees]\label{lem:outside_grad_bound_gamma_param} Assume $v\in C$ so that $b_{\bar C}=0$. Then for each $i\in\bar C$, \begin{equation}\label{eq:outside_grad_pointwise_gamma_param} |\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\mathrm{cut}(C,\bar C)}{d_i\,d_{\mathrm{in}}}}. \end{equation} Consequently, using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$, \begin{equation}\label{eq:outside_grad_uniform_gamma_param} \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}. \end{equation} \end{lemma} \begin{proof} Fix $i\in\bar C$. Since $\hat x_{\bar C}=0$ and $b_{\bar C}=0$, \[ \nabla f(\hat x)_i = (Q\hat x - b)_i = (Q_{\bar C C}\hat x_C)_i = \sum_{j\in C}Q_{ij}\hat x_j. \] From \eqref{eq:Q_def_gamma_param}, for $i\neq j$ we have \[ Q_{ij} = -\frac{1-\alpha}{2}(D^{-1/2}AD^{-1/2})_{ij} = \begin{cases} -\dfrac{1-\alpha}{2}\dfrac{1}{\sqrt{d_i d_j}}, & (i,j)\in E,\\[2mm] 0, & (i,j)\notin E. \end{cases} \] Thus \[ |\nabla f(\hat x)_i| = \left|\sum_{j\in C\cap N(i)} -\frac{1-\alpha}{2}\frac{\hat x_j}{\sqrt{d_i d_j}}\right| \le \frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i}} \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}. \] Apply Cauchy--Schwarz to the sum: \[ \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}} \le \left(\sum_{j\in C\cap N(i)} \hat x_j^2\right)^{1/2} \left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2} \le \|\hat x_C\|_2\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}. \] Since $d_j\ge d_{\mathrm{in}}$ for all $j\in C$, we have \[ \sum_{j\in C\cap N(i)} \frac{1}{d_j} \le \frac{|C\cap N(i)|}{d_{\mathrm{in}}}. \] Finally, $|C\cap N(i)|$ is the number of cut edges incident to $i$, hence $|C\cap N(i)|\le \mathrm{cut}(C,\bar C)$, which gives \eqref{eq:outside_grad_pointwise_gamma_param}. The uniform bound \eqref{eq:outside_grad_uniform_gamma_param} follows by using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$. \end{proof} \subsection{A parameterized margin lower bound and a $\sqrt{\rho}$ corollary} Define the cluster-dependent quantity \begin{equation}\label{eq:DeltaC_def_gamma_param} \Delta(C) := \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation} \begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree] \label{thm:gamma_parameterized_by_cluster} Assume $v\in C$ and that the restricted candidate \eqref{eq:xhat_def_gamma_param} satisfies \begin{equation}\label{eq:xhat_positive_gamma_param} \hat x_C>0\quad\text{(componentwise)}. \end{equation} Then: \begin{enumerate} \item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique} global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$. \item[(ii)] Whenever $x^\star=\hat x$, the strict-complementarity margin satisfies the explicit lower bound \begin{equation}\label{eq:gamma_param_bound_gamma_param} \gamma = \min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr) \;\ge\; \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C). \end{equation} Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then \begin{equation}\label{eq:gamma_eta_bound_gamma_param} \gamma \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}. \end{equation} \end{enumerate} \end{theorem} \begin{proof} We prove (i) and (ii) in order. \medskip\noindent\textbf{Step 1: KKT conditions for \eqref{eq:ppr_obj_gamma_param}.} Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists $z\in\partial g(x)$ such that \begin{equation}\label{eq:KKT_gamma_param} \nabla f(x)+z=0. \end{equation} Moreover, $F$ is strongly convex because $f$ is $\alpha$-strongly convex and $g$ is convex, so the minimizer is unique. \medskip\noindent\textbf{Step 2: Build a valid $z\in\partial g(\hat x)$ and verify \eqref{eq:KKT_gamma_param}.} Because of \eqref{eq:xhat_positive_gamma_param}, $\mathrm{sign}(\hat x_i)=+1$ for all $i\in C$. Define $z\in\mathbb{R}^n$ by \[ z_i := \begin{cases} \lambda_i, & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$. Indeed: \begin{itemize} \item If $i\in C$, then $\hat x_i>0$ and the weighted-$\ell_1$ subgradient requires $z_i=\lambda_i$, which holds. \item If $i\in\bar C$, then $\hat x_i=0$ and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$. This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$. \end{itemize} Next, Lemma~\ref{lem:restricted_stationarity_gamma_param} gives $\nabla f(\hat x)_C=-\lambda_C$, so $\nabla f(\hat x)_C+z_C=0$. By definition, $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$. Thus \eqref{eq:KKT_gamma_param} holds. \medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.} By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param}, \[ \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \Delta(C). \] Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$, then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$, so $z\in\partial g(\hat x)$ and the KKT condition holds. Hence $\hat x$ is a minimizer, and by strong convexity it is the unique minimizer: $\hat x=x^\star$. This proves (i). \medskip\noindent\textbf{Step 4: Lower bound the margin.} Assume $x^\star=\hat x$. Since $x^\star_{\bar C}=0$, the inactive set contains $\bar C$ and \[ \gamma = \min_{i\in \bar C}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) \ge \min_{i\in\bar C}\lambda_i - \max_{i\in\bar C}|\nabla f(\hat x)_i| = \rho\alpha\sqrt{d_{\mathrm{out}}} - \max_{i\in\bar C}|\nabla f(\hat x)_i|. \] Using $\max_{i\in\bar C}|\nabla f(\hat x)_i|\le \Delta(C)$ gives \eqref{eq:gamma_param_bound_gamma_param}. The $\eta$-form \eqref{eq:gamma_eta_bound_gamma_param} is immediate. \end{proof} \begin{corollary}[A concrete $\gamma=\Omega(\sqrt{\rho})$ regime]\label{cor:gamma_Omega_sqrt_rho_regime} Assume the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster}. Suppose, in addition, that there are constants $V_0,d_0,\mu_0,c_0>0$ such that \[ \mathrm{vol}(C)\le \frac{V_0}{\rho},\qquad d_{\mathrm{in}}\ge d_0,\qquad d_v\ge d_0,\qquad \mu_C\ge \mu_0,\qquad d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho}. \] If furthermore the conductance satisfies \begin{equation}\label{eq:phi_rho_condition_gamma_param} \phi(C)\le c_1\rho \quad\text{for a sufficiently small constant $c_1$ (depending only on $V_0,d_0,\mu_0,c_0,\alpha$),} \end{equation} then there exists a constant $\eta\in(0,1)$ (depending only on the same parameters) such that \[ \gamma \ge \eta\,\sqrt{\rho}. \] \end{corollary} \begin{proof} From $d_{\mathrm{out}}\ge c_0/(\alpha^2\rho)$ we get \[ \rho\alpha\sqrt{d_{\mathrm{out}}}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}. \] Next, using the assumed bounds in \eqref{eq:DeltaC_def_gamma_param}, \[ \Delta(C) \le \frac{1-\alpha}{2\mu_0}\, \sqrt{\frac{\phi(C)\cdot (V_0/\rho)}{(c_0/(\alpha^2\rho))\cdot d_0}}\, \left(\frac{\alpha}{\sqrt{d_0}}+\rho\alpha\sqrt{V_0/\rho}\right) = \frac{1-\alpha}{2\mu_0}\, \alpha\,\sqrt{\frac{V_0}{c_0 d_0}}\, \sqrt{\phi(C)}\, \left(\frac{\alpha}{\sqrt{d_0}}+\alpha\sqrt{V_0}\sqrt{\rho}\right). \] Under \eqref{eq:phi_rho_condition_gamma_param}, $\sqrt{\phi(C)}\le \sqrt{c_1}\sqrt{\rho}$, hence $\Delta(C)\le K\sqrt{\rho}$ for a constant $K$ depending only on $(V_0,d_0,\mu_0,c_0,\alpha)$ and $c_1$. Choosing $c_1$ sufficiently small ensures $\Delta(C)\le (1-\eta)\sqrt{c_0}\sqrt{\rho}$ for some $\eta\in(0,1)$. Applying \eqref{eq:gamma_param_bound_gamma_param} then yields $\gamma\ge \eta\sqrt{c_0}\sqrt{\rho}$. Absorbing $\sqrt{c_0}$ into $\eta$ gives the stated $\gamma\ge \eta\sqrt{\rho}$. \end{proof} \begin{remark}[Interpretation] The quantity $\Delta(C)$ in \eqref{eq:DeltaC_def_gamma_param} is explicit and shows how the margin depends on: (i) the \emph{outside degree} $d_{\mathrm{out}}$ through the threshold $\lambda_i=\rho\alpha\sqrt{d_i}$, (ii) the \emph{boundary size} through $\phi(C)$, and (iii) the \emph{internal conditioning} $\mu_C=\lambda_{\min}(Q_{CC})$. When $\phi(C)=O(\rho)$, $d_{\mathrm{out}}=\Theta(1/(\alpha^2\rho))$, and $\mu_C$ is bounded below, the bound forces $\gamma=\Omega(\sqrt{\rho})$. \end{remark} % ============================================================" === Assistant: Yes — let’s make this mechanical. === Yes — let’s make this mechanical. You don’t need to guess what to change: you will (A) replace the definition of x^\hat xx^, (B) delete one lemma, (C) add two lemmas, (D) replace one lemma’s proof, and (E) replace the theorem+proof (and delete the positivity assumption). Below I refer to the exact blocks in the code you pasted. ==== ### ==== It starts at: <syntaxhighlight lang="latex">We will construct a candidate vector supported on $C$ by solving the KKT stationarity equations \emph{restricted to $C$} under the natural sign choice $x_C>0$ (appropriate for PageRank-type solutions). Define the restricted candidate $\hat x$ by \begin{equation}\label{eq:xhat_def_gamma_param} \hat x_{\bar C}:=0, \qquad \hat x_C:=Q_{CC}^{-1}\bigl(b_C-\lambda_C\bigr), \qquad b:=\alpha D^{-1/2}s. \end{equation} </syntaxhighlight> ===== (copy-paste exactly): ===== <syntaxhighlight lang="latex">We will construct a candidate vector supported on $C$ as the \emph{restricted minimizer} of the full objective over vectors with $x_{\bar C}=0$. This avoids imposing (and later proving) a sign pattern such as $\hat x_C>0$. Let $b:=\alpha D^{-1/2}s$. Define $\hat x$ to be the unique minimizer of \begin{equation}\label{eq:xhat_def_gamma_param} \hat x \in \arg\min_{x\in\mathbb{R}^n:\; x_{\bar C}=0}\; \left\{ \frac12 x^\top Qx-\alpha x^\top D^{-1/2}s \;+\; \sum_{i=1}^n \lambda_i |x_i| \right\}. \end{equation} Equivalently, $\hat x_{\bar C}=0$ and $\hat x_C$ uniquely minimizes \[ \min_{x_C\in\mathbb{R}^{|C|}} \left\{ \frac12 x_C^\top Q_{CC}x_C - b_C^\top x_C + \sum_{i\in C}\lambda_i|x_i| \right\}. \] </syntaxhighlight> Keep your equation for ∥bC∥2\|b_C\|_2∥bC∥2 (eq:bC_norm_gamma_param) right after this (no change). ==== ### ==== Remove everything from: <syntaxhighlight lang="latex">\begin{lemma}[Stationarity on $C$]\label{lem:restricted_stationarity_gamma_param} ... \end{proof} </syntaxhighlight> That lemma relied on the closed-form formula QCC−1(bC−λC)Q_{CC}^{-1}(b_C-\lambda_C)QCC−1(bC−λC), which is not the restricted minimizer once the ℓ1\ell_1ℓ1 term can create zeros/sign changes inside CCC. ==== Paste the following exactly where the stationarity lemma used to be: ==== <syntaxhighlight lang="latex">\begin{lemma}[Absolute-value improvement and nonnegativity]\label{lem:abs_improvement_nonneg} Assume $v\in C$ so that $b_C\ge 0$ and $b_{\bar C}=0$. Then for every $x$ with $x_{\bar C}=0$, the componentwise absolute value $|x|$ satisfies \[ F(|x|)\le F(x). \] Consequently, the restricted minimizer $\hat x$ satisfies $\hat x_C\ge 0$ componentwise. \end{lemma} \begin{proof} Fix any $x$ with $x_{\bar C}=0$ and set $y:=|x|$ (so $y_{\bar C}=0$). Write \[ F(x)=\frac12 x^\top Qx - b^\top x + \sum_{i=1}^n \lambda_i|x_i|. \] \emph{Quadratic term.} Since $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$, we have $Q_{ij}\le 0$ for $i\neq j$. Expand \[ x^\top Qx = \sum_i Q_{ii}x_i^2 + 2\sum_{i<j}Q_{ij}x_i x_j. \] For each $i<j$, $x_i x_j\le |x_i||x_j|=y_i y_j$, and since $Q_{ij}\le 0$ this implies $Q_{ij}x_i x_j \ge Q_{ij}y_i y_j$. Hence $x^\top Qx \ge y^\top Qy$. \emph{Linear term.} Since $b\ge 0$ (here $b=\alpha D^{-1/2}s$ with $s=e_v$) and $y_i=|x_i|\ge x_i$ for all $i$, we have $-b^\top y \le -b^\top x$. \emph{$\ell_1$ term.} $\sum_i \lambda_i|y_i|=\sum_i\lambda_i|x_i|$ by definition. Combining the three comparisons gives $F(y)\le F(x)$. Now let $\hat x$ be the unique restricted minimizer. Then $F(|\hat x|)\le F(\hat x)$, so $|\hat x|$ is also a minimizer. By uniqueness, $|\hat x|=\hat x$, hence $\hat x_C\ge 0$. \end{proof} \begin{lemma}[Restricted KKT on $C$]\label{lem:restricted_KKT_on_C} There exists a vector $z_C\in\mathbb{R}^{|C|}$ such that \begin{equation}\label{eq:restricted_KKT_on_C} Q_{CC}\hat x_C - b_C + z_C = 0, \end{equation} and for every $i\in C$, \begin{equation}\label{eq:zC_in_subdiff} z_{C,i}\in \lambda_i\,\partial|\hat x_i| \subseteq [-\lambda_i,\lambda_i]. \end{equation} In particular, $\|z_C\|_2 \le \|\lambda_C\|_2 = \rho\alpha\sqrt{\mathrm{vol}(C)}$. \end{lemma} \begin{proof} The restricted objective over $x_C$ is strongly convex because $Q_{CC}\succ 0$ and the $\ell_1$ term is convex, so $\hat x_C$ is unique. First-order optimality yields \[ 0\in Q_{CC}\hat x_C - b_C + \partial\!\Big(\sum_{i\in C}\lambda_i|\hat x_i|\Big), \] which is exactly the existence of $z_C$ satisfying \eqref{eq:restricted_KKT_on_C}--\eqref{eq:zC_in_subdiff}. The norm bound follows from $|z_{C,i}|\le \lambda_i$ and $\|\lambda_C\|_2^2=\sum_{i\in C}(\rho\alpha\sqrt{d_i})^2=(\rho\alpha)^2\mathrm{vol}(C)$. \end{proof} </syntaxhighlight> ==== You can keep the statement and label exactly the same; only change the proof so it no longer uses ==== x^C=QCC−1(bC−λC)\hat x_C=Q_{CC}^{-1}(b_C-\lambda_C)x^C=QCC−1(bC−λC). ===== <syntaxhighlight lang="latex">\begin{proof} ===== From Lemma~\ref{lem:restricted_KKT_on_C}, $Q_{CC}\hat x_C=b_C-z_C$, so \[ \|\hat x_C\|_2 \le \|Q_{CC}^{-1}\|_2\,\|b_C-z_C\|_2 \le \|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|z_C\|_2\bigr). \] Since $\lambda_{\min}(Q_{CC})=\mu_C$, we have $\|Q_{CC}^{-1}\|_2=1/\mu_C$. Also, by \eqref{eq:bC_norm_gamma_param}, $\|b_C\|_2=\alpha/\sqrt{d_v}$. Finally, Lemma~\ref{lem:restricted_KKT_on_C} gives $\|z_C\|_2\le \rho\alpha\sqrt{\mathrm{vol}(C)}$. Substituting yields \eqref{eq:xC_norm_gamma_param}. \end{proof} </syntaxhighlight> Everything else in that lemma (statement and equation eq:xC_norm_gamma_param) stays unchanged. ==== ### ==== Remove these lines entirely: <syntaxhighlight lang="latex">Assume $v\in C$ and that the restricted candidate \eqref{eq:xhat_def_gamma_param} satisfies \begin{equation}\label{eq:xhat_positive_gamma_param} \hat x_C>0\quad\text{(componentwise)}. \end{equation} </syntaxhighlight> ===== Replace everything from: ===== <syntaxhighlight lang="latex">\begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree] \label{thm:gamma_parameterized_by_cluster} ... \end{proof} </syntaxhighlight> with the following (copy-paste): <syntaxhighlight lang="latex">\begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree] \label{thm:gamma_parameterized_by_cluster} Assume $v\in C$ and let $\hat x$ be defined by \eqref{eq:xhat_def_gamma_param} (restricted minimizer with $\hat x_{\bar C}=0$). Then: \begin{enumerate} \item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique} global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$. \item[(ii)] Whenever $x^\star=\hat x$, the \emph{outside} slack (margin on $\bar C$) satisfies the explicit lower bound \begin{equation}\label{eq:gamma_param_bound_gamma_param} \min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr) \;\ge\; \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C). \end{equation} Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then \begin{equation}\label{eq:gamma_eta_bound_gamma_param} \min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr) \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}. \end{equation} \end{enumerate} \end{theorem} \begin{proof} We prove (i) and (ii) in order. \medskip\noindent\textbf{Step 1: KKT conditions for \eqref{eq:ppr_obj_gamma_param}.} Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists $z\in\partial g(x)$ such that \begin{equation}\label{eq:KKT_gamma_param} \nabla f(x)+z=0. \end{equation} Moreover, $F$ is strongly convex because $f$ is $\alpha$-strongly convex and $g$ is convex, so the minimizer is unique. \medskip\noindent\textbf{Step 2: Build a valid $z\in\partial g(\hat x)$ and verify \eqref{eq:KKT_gamma_param}.} From Lemma~\ref{lem:restricted_KKT_on_C}, there exists $z_C$ such that $Q_{CC}\hat x_C-b_C+z_C=0$ and $z_{C,i}\in[-\lambda_i,\lambda_i]$ for all $i\in C$. Define $z\in\mathbb{R}^n$ by \[ z_i := \begin{cases} (z_C)_i, & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$. Indeed: \begin{itemize} \item If $i\in C$, then $z_{C,i}\in \lambda_i\,\partial|\hat x_i|$ by Lemma~\ref{lem:restricted_KKT_on_C}, so it is a valid weighted-$\ell_1$ subgradient component on $C$. \item If $i\in\bar C$, then $\hat x_i=0$ and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$. This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$. \end{itemize} Next we check \eqref{eq:KKT_gamma_param}. On $C$, since $\hat x_{\bar C}=0$, \[ \nabla f(\hat x)_C = (Q\hat x-b)_C = Q_{CC}\hat x_C - b_C, \] hence $\nabla f(\hat x)_C+z_C=0$ by $Q_{CC}\hat x_C-b_C+z_C=0$. On $\bar C$, by construction $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$. Therefore $\nabla f(\hat x)+z=0$. \medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.} By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param}, \[ \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \Delta(C). \] Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$, then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$, so indeed $z\in\partial g(\hat x)$ and the KKT condition holds. Hence $\hat x$ is a minimizer, and by strong convexity it is the unique minimizer: $\hat x=x^\star$. This proves (i). \medskip\noindent\textbf{Step 4: Lower bound the outside margin on $\bar C$.} Assume $x^\star=\hat x$. Then for every $i\in\bar C$, \[ \lambda_i-|\nabla f(x^\star)_i| =\lambda_i-|\nabla f(\hat x)_i| \ge \lambda_i-\Delta(C) \ge \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C), \] where we used $\lambda_i\ge \rho\alpha\sqrt{d_{\mathrm{out}}}$. Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_param_bound_gamma_param}, and the $\eta$-form \eqref{eq:gamma_eta_bound_gamma_param} follows immediately. \end{proof} </syntaxhighlight> ✅ After this change, you no longer need to prove x^C>0\hat x_C>0x^C>0; the theorem is correct with only x^C≥0\hat x_C\ge 0x^C≥0 (which we proved in Lemma lem:abs_improvement_nonneg), and it yields the parameterized slack bound on Cˉ\bar CCˉ (which is what your conductance/outside-degree analysis controls). ==== This updated theorem lower bounds the slack for nodes in Cˉ\bar CCˉ. It does not claim that every node in CCC is nonzero; that cannot be guaranteed just from ϕ(C)\phi(C)ϕ(C), μC\mu_CμC, doutd_{\mathrm{out}}dout (the ℓ1\ell_1ℓ1 term can still zero out interior nodes). ==== If you want, I can show you the minimal additional “interior beta-min” condition that makes the global margin coincide with this outside margin. (And yes: this whole section is for objective (3) in your note. open-problem-fountoulakis22a ) === User: Sorry this is a bit messed up. === Sorry this is a bit messed up. Can you simply give me a single latex snippet that replaces the whole "\section{A conductance--connectivity--degree lower bound on the margin}" appropriately? I don't want to be copy-pasting pieces here and there. === Assistant: Below is a single self-contained LaTeX snippet that you can paste in your file to replace the entire section starting at === Below is a single self-contained LaTeX snippet that you can paste in your file to replace the entire section starting at <syntaxhighlight lang="latex">\section{A conductance--connectivity--degree lower bound on the margin} </syntaxhighlight> and ending at its closing remark. This version does not assume \hat x_C>0 anywhere. Instead, it defines x^\hat xx^ as the restricted minimizer on CCC, proves x^C≥0\hat x_C\ge 0x^C≥0, builds the global KKT certificate using the restricted KKT subgradient, and then derives an explicit parameterized lower bound on the outside margin on Cˉ\bar CCˉ in terms of ϕ(C)\phi(C)ϕ(C), μC\mu_CμC, and doutd_{\mathrm{out}}dout. It also includes a corollary giving an Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ) regime. <syntaxhighlight lang="latex">% ============================================================ \section{A conductance--connectivity--degree lower bound on the margin} \label{sec:gamma_conductance_connectivity_degree} We specialize to the $\ell_1$-regularized PageRank objective~(3): \begin{equation}\label{eq:ppr_obj_gamma_param} \min_{x\in\mathbb{R}^n} F(x) := \underbrace{\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s}_{=:f(x)} \;+\; \underbrace{\rho\alpha\|D^{1/2}x\|_1}_{=:g(x)}. \end{equation} Here $G=(V,E)$ is undirected and unweighted, $A$ is the adjacency matrix, $D$ is the diagonal degree matrix, $s=e_v$ is a seed at node $v$, and \begin{equation}\label{eq:Q_def_gamma_param} Q=\alpha I+\frac{1-\alpha}{2}L = \frac{1+\alpha}{2}I-\frac{1-\alpha}{2}\,D^{-1/2}AD^{-1/2}. \end{equation} The weighted $\ell_1$ penalty has coordinate weights \begin{equation}\label{eq:lambda_def_gamma_param} g(x)=\sum_{i=1}^n \lambda_i |x_i|, \qquad \lambda_i:=\rho\alpha\sqrt{d_i}. \end{equation} Since $Q\succeq \alpha I$, the function $f$ is $\alpha$-strongly convex and hence $F=f+g$ has a unique minimizer $x^\star$. \paragraph{Cluster notation.} Fix a set (candidate cluster) $C\subseteq V$ with $v\in C$, and let $\bar C:=V\setminus C$. Define the volume and cut: \[ \mathrm{vol}(C):=\sum_{i\in C} d_i, \qquad \mathrm{cut}(C,\bar C):=|\{(i,j)\in E:\; i\in C,\ j\in\bar C\}|. \] We use the (one-sided) conductance \[ \phi(C):=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)}. \] Let \[ d_{\mathrm{in}}:=\min_{j\in C} d_j, \qquad d_{\mathrm{out}}:=\min_{i\in\bar C} d_i, \qquad d_v:=d_{v}. \] We write $N(i)$ for the neighbor set of node $i$. \paragraph{Internal connectivity parameter.} Let \begin{equation}\label{eq:muC_def_gamma_param} \mu_C:=\lambda_{\min}(Q_{CC}), \end{equation} the smallest eigenvalue of the principal submatrix $Q_{CC}$. \subsection{Restricted minimizer on $C$, its nonnegativity, and a norm bound} Define $b:=\alpha D^{-1/2}s$. Since $s=e_v$ and $v\in C$, the vector $b_C$ has exactly one nonzero entry and \begin{equation}\label{eq:bC_norm_gamma_param} \|b_C\|_2=\frac{\alpha}{\sqrt{d_v}}. \end{equation} \begin{definition}[Restricted minimizer on $C$]\label{def:xhat_restricted_gamma_param} Let $\hat x$ denote the unique minimizer of \eqref{eq:ppr_obj_gamma_param} subject to $x_{\bar C}=0$: \begin{equation}\label{eq:xhat_def_gamma_param} \hat x \in \operatorname*{argmin}_{x\in\mathbb{R}^n:\; x_{\bar C}=0}\; F(x). \end{equation} Equivalently, $\hat x_{\bar C}=0$ and $\hat x_C$ uniquely minimizes \[ \min_{x_C\in\mathbb{R}^{|C|}} \left\{ \frac12 x_C^\top Q_{CC}x_C - b_C^\top x_C + \sum_{i\in C}\lambda_i|x_i| \right\}. \] \end{definition} \begin{lemma}[Absolute-value improvement and nonnegativity]\label{lem:abs_improvement_nonneg_gamma_param} Assume $v\in C$ so that $b_C\ge 0$ and $b_{\bar C}=0$. Then for every $x$ with $x_{\bar C}=0$, the componentwise absolute value $|x|$ satisfies \[ F(|x|)\le F(x). \] Consequently, the restricted minimizer satisfies $\hat x_C\ge 0$ componentwise. \end{lemma} \begin{proof} Fix any $x$ with $x_{\bar C}=0$ and set $y:=|x|$ (so $y_{\bar C}=0$). Write \[ F(x)=\frac12 x^\top Qx - b^\top x + \sum_{i=1}^n \lambda_i|x_i|. \] \medskip\noindent\textbf{Quadratic term.} Since $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$, we have $Q_{ij}\le 0$ for $i\neq j$. Expand \[ x^\top Qx = \sum_i Q_{ii}x_i^2 + 2\sum_{i<j}Q_{ij}x_i x_j. \] For each $i<j$, we have $x_i x_j\le |x_i||x_j|=y_i y_j$, and since $Q_{ij}\le 0$ this implies $Q_{ij}x_i x_j \ge Q_{ij}y_i y_j$. Hence $x^\top Qx \ge y^\top Qy$. \medskip\noindent\textbf{Linear term.} Since $b\ge 0$ and $y_i=|x_i|\ge x_i$ for all $i$, we have $-b^\top y \le -b^\top x$. \medskip\noindent\textbf{$\ell_1$ term.} $\sum_i \lambda_i|y_i|=\sum_i\lambda_i|x_i|$ by definition. \medskip\noindent Combining the three comparisons gives $F(y)\le F(x)$. Now let $\hat x$ be the unique restricted minimizer. Then $F(|\hat x|)\le F(\hat x)$, so $|\hat x|$ is also a minimizer. By uniqueness, $|\hat x|=\hat x$, hence $\hat x_C\ge 0$. \end{proof} \begin{lemma}[Restricted KKT on $C$]\label{lem:restricted_KKT_on_C_gamma_param} There exists a vector $z_C\in\mathbb{R}^{|C|}$ such that \begin{equation}\label{eq:restricted_KKT_on_C_gamma_param} Q_{CC}\hat x_C - b_C + z_C = 0, \end{equation} and for every $i\in C$, \begin{equation}\label{eq:zC_in_subdiff_gamma_param} z_{C,i}\in \lambda_i\,\partial|\hat x_i| \subseteq [-\lambda_i,\lambda_i]. \end{equation} In particular, $\|z_C\|_2 \le \|\lambda_C\|_2 = \rho\alpha\sqrt{\mathrm{vol}(C)}$. \end{lemma} \begin{proof} The restricted objective over $x_C$ is strongly convex because $Q_{CC}\succ 0$ and the $\ell_1$ term is convex, so $\hat x_C$ is unique. First-order optimality yields \[ 0\in Q_{CC}\hat x_C - b_C + \partial\!\Big(\sum_{i\in C}\lambda_i|\hat x_i|\Big), \] which is exactly the existence of $z_C$ satisfying \eqref{eq:restricted_KKT_on_C_gamma_param}--\eqref{eq:zC_in_subdiff_gamma_param}. The norm bound follows from $|z_{C,i}|\le \lambda_i$ and $\|\lambda_C\|_2^2=\sum_{i\in C}(\rho\alpha\sqrt{d_i})^2=(\rho\alpha)^2\mathrm{vol}(C)$. \end{proof} \begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xC_norm_gamma_param} Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then \begin{equation}\label{eq:xC_norm_gamma_param} \|\hat x_C\|_2 \;\le\; \frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation} \end{lemma} \begin{proof} From Lemma~\ref{lem:restricted_KKT_on_C_gamma_param}, $Q_{CC}\hat x_C=b_C-z_C$, so \[ \|\hat x_C\|_2 \le \|Q_{CC}^{-1}\|_2\,\|b_C-z_C\|_2 \le \|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|z_C\|_2\bigr). \] Since $\lambda_{\min}(Q_{CC})=\mu_C$, we have $\|Q_{CC}^{-1}\|_2=1/\mu_C$. Also, $\|b_C\|_2=\alpha/\sqrt{d_v}$ by \eqref{eq:bC_norm_gamma_param}, and $\|z_C\|_2\le \rho\alpha\sqrt{\mathrm{vol}(C)}$ by Lemma~\ref{lem:restricted_KKT_on_C_gamma_param}. Substituting yields \eqref{eq:xC_norm_gamma_param}. \end{proof} \subsection{A pointwise outside-gradient bound from conductance and degrees} \begin{lemma}[Outside gradient bound from conductance and degrees]\label{lem:outside_grad_bound_gamma_param} Assume $v\in C$ so that $b_{\bar C}=0$. Then for each $i\in\bar C$, \begin{equation}\label{eq:outside_grad_pointwise_gamma_param} |\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\mathrm{cut}(C,\bar C)}{d_i\,d_{\mathrm{in}}}}. \end{equation} Consequently, using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$, \begin{equation}\label{eq:outside_grad_uniform_gamma_param} \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}. \end{equation} \end{lemma} \begin{proof} Fix $i\in\bar C$. Since $\hat x_{\bar C}=0$ and $b_{\bar C}=0$, \[ \nabla f(\hat x)_i = (Q\hat x - b)_i = (Q_{\bar C C}\hat x_C)_i = \sum_{j\in C}Q_{ij}\hat x_j. \] From \eqref{eq:Q_def_gamma_param}, for $i\neq j$ we have \[ Q_{ij} = -\frac{1-\alpha}{2}(D^{-1/2}AD^{-1/2})_{ij} = \begin{cases} -\dfrac{1-\alpha}{2}\dfrac{1}{\sqrt{d_i d_j}}, & (i,j)\in E,\\[2mm] 0, & (i,j)\notin E. \end{cases} \] Thus \[ |\nabla f(\hat x)_i| = \left|\sum_{j\in C\cap N(i)} -\frac{1-\alpha}{2}\frac{\hat x_j}{\sqrt{d_i d_j}}\right| \le \frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i}} \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}. \] Apply Cauchy--Schwarz: \[ \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}} \le \left(\sum_{j\in C\cap N(i)} \hat x_j^2\right)^{1/2} \left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2} \le \|\hat x_C\|_2\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}. \] Since $d_j\ge d_{\mathrm{in}}$ for all $j\in C$, \[ \sum_{j\in C\cap N(i)} \frac{1}{d_j} \le \frac{|C\cap N(i)|}{d_{\mathrm{in}}}. \] Finally, $|C\cap N(i)|$ is the number of cut edges incident to $i$, hence $|C\cap N(i)|\le \mathrm{cut}(C,\bar C)$, which gives \eqref{eq:outside_grad_pointwise_gamma_param}. The uniform bound \eqref{eq:outside_grad_uniform_gamma_param} follows by using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$. \end{proof} \subsection{A parameterized outside-margin lower bound and a $\sqrt{\rho}$ regime} Define the cluster-dependent quantity \begin{equation}\label{eq:DeltaC_def_gamma_param} \Delta(C) := \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation} \begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree] \label{thm:gamma_parameterized_by_cluster} Assume $v\in C$ and let $\hat x$ be the restricted minimizer \eqref{eq:xhat_def_gamma_param}. Define the \emph{outside margin} on the complement \[ \gamma_{\bar C} := \min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr). \] Then: \begin{enumerate} \item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique} global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$. \item[(ii)] Whenever $x^\star=\hat x$, we have the explicit bound \begin{equation}\label{eq:gamma_param_bound_gamma_param} \gamma_{\bar C} \;\ge\; \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C). \end{equation} Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then \begin{equation}\label{eq:gamma_eta_bound_gamma_param} \gamma_{\bar C} \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}. \end{equation} \end{enumerate} \end{theorem} \begin{proof} We prove (i) and (ii) in order. \medskip\noindent\textbf{Step 1: KKT conditions.} Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists $z\in\partial g(x)$ such that $\nabla f(x)+z=0$. Moreover, $F$ is strongly convex (because $Q\succeq \alpha I$), hence the minimizer is unique. \medskip\noindent\textbf{Step 2: Build a global KKT certificate from the restricted minimizer.} From Lemma~\ref{lem:restricted_KKT_on_C_gamma_param}, there exists $z_C$ such that \[ Q_{CC}\hat x_C-b_C+z_C=0 \quad\text{and}\quad z_{C,i}\in[-\lambda_i,\lambda_i]\;\; \forall i\in C. \] Define $z\in\mathbb{R}^n$ by \[ z_i := \begin{cases} (z_C)_i, & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$ if $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$. Indeed: \begin{itemize} \item If $i\in C$, then $z_{C,i}\in \lambda_i\,\partial|\hat x_i|$ by \eqref{eq:zC_in_subdiff_gamma_param}, hence it is valid. \item If $i\in\bar C$, then $\hat x_i=0$ and $z_i\in[-\lambda_i,\lambda_i]$ holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$. \end{itemize} Now check $\nabla f(\hat x)+z=0$. On $C$, since $\hat x_{\bar C}=0$, \[ \nabla f(\hat x)_C=(Q\hat x-b)_C = Q_{CC}\hat x_C - b_C, \] so $\nabla f(\hat x)_C+z_C=0$ by $Q_{CC}\hat x_C-b_C+z_C=0$. On $\bar C$, $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$ by definition. Thus $\nabla f(\hat x)+z=0$. \medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.} By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param}, \[ \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \Delta(C). \] Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$, then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$ and hence $z\in\partial g(\hat x)$. Thus $0\in\partial F(\hat x)$, so $\hat x$ is a minimizer. By uniqueness, $\hat x=x^\star$, proving (i). \medskip\noindent\textbf{Step 4: Lower bound the outside margin.} Assume $x^\star=\hat x$. Then for every $i\in\bar C$, \[ \lambda_i-|\nabla f(x^\star)_i| =\lambda_i-|\nabla f(\hat x)_i| \ge \lambda_i-\Delta(C) \ge \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C), \] where we used $\lambda_i\ge \rho\alpha\sqrt{d_{\mathrm{out}}}$. Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_param_bound_gamma_param}, and \eqref{eq:gamma_eta_bound_gamma_param} follows. \end{proof} \begin{corollary}[A concrete $\gamma_{\bar C}=\Omega(\sqrt{\rho})$ regime]\label{cor:gamma_Omega_sqrt_rho_regime} Assume the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster}. Suppose, in addition, that there are constants $V_0,d_0,\mu_0,c_0>0$ such that \[ \mathrm{vol}(C)\le \frac{V_0}{\rho},\qquad d_{\mathrm{in}}\ge d_0,\qquad d_v\ge d_0,\qquad \mu_C\ge \mu_0,\qquad d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho}. \] If furthermore the conductance satisfies \begin{equation}\label{eq:phi_rho_condition_gamma_param} \phi(C)\le c_1\rho \quad\text{for a sufficiently small constant $c_1$ (depending only on $V_0,d_0,\mu_0,c_0,\alpha$),} \end{equation} then there exists a constant $\eta\in(0,1)$ (depending only on the same parameters) such that \[ \gamma_{\bar C} \ge \eta\,\sqrt{\rho}. \] \end{corollary} \begin{proof} First, from $d_{\mathrm{out}}\ge c_0/(\alpha^2\rho)$ we get \[ \rho\alpha\sqrt{d_{\mathrm{out}}}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}. \] Next, using the assumed bounds in \eqref{eq:DeltaC_def_gamma_param}, \begin{align*} \Delta(C) &\le \frac{1-\alpha}{2\mu_0}\, \sqrt{\frac{\phi(C)\cdot (V_0/\rho)}{(c_0/(\alpha^2\rho))\cdot d_0}}\, \left(\frac{\alpha}{\sqrt{d_0}}+\rho\alpha\sqrt{V_0/\rho}\right)\\ &= \frac{1-\alpha}{2\mu_0}\, \alpha\,\sqrt{\frac{V_0}{c_0 d_0}}\, \sqrt{\phi(C)}\, \left(\frac{\alpha}{\sqrt{d_0}}+\alpha\sqrt{V_0}\sqrt{\rho}\right). \end{align*} Under \eqref{eq:phi_rho_condition_gamma_param}, $\sqrt{\phi(C)}\le \sqrt{c_1}\sqrt{\rho}$, hence $\Delta(C)\le K\sqrt{c_1}\sqrt{\rho}$ for a constant $K$ depending only on $(V_0,d_0,\mu_0,c_0,\alpha)$. Choosing $c_1$ sufficiently small ensures $\Delta(C)\le (1-\eta)\sqrt{c_0}\sqrt{\rho}$ for some $\eta\in(0,1)$, i.e. \[ \Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}. \] Apply \eqref{eq:gamma_eta_bound_gamma_param} to conclude $\gamma_{\bar C}\ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}} \ge \eta\sqrt{c_0}\sqrt{\rho}$, and absorb $\sqrt{c_0}$ into $\eta$. \end{proof} \begin{remark}[Outside margin vs.\ the global margin $\gamma$] Theorem~\ref{thm:gamma_parameterized_by_cluster} lower bounds the margin on $\bar C$. The \emph{global} strict-complementarity margin $\gamma=\min_{i\in I^\star}(\lambda_i-|\nabla f(x^\star)_i|)$ could be smaller if some coordinates inside $C$ are also zero at $x^\star$. To deduce the same lower bound for the global $\gamma$, it suffices (but is not implied by conductance alone) to additionally have $x^\star_i>0$ for all $i\in C$, i.e.\ $I^\star=\bar C$. \end{remark} % ============================================================ </syntaxhighlight> If you paste this in place of your current section, it should compile cleanly and removes the “mystery assumption” x^C>0\hat x_C>0x^C>0 entirely. === User: Can you solve "\DeltaC\le 1-\eta\rho\alpha\sqrt{d{\mathrm{out}}}" as a function of φC? === Can you solve "\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}" as a function of φ(C)? In other words, how small should conductance be? === Assistant: Starting from the definition of ΔC\DeltaCΔC in that section specialized to the ℓ1\ell1ℓ1-regularized PPR objective 3 o… === Starting from the definition of Δ(C)\Delta(C)Δ(C) in that section (specialized to the ℓ1\ell_1ℓ1-regularized PPR objective (3) open-problem-fountoulakis22a ), Δ(C)=1−α2μC ϕ(C) vol(C)dout din (αdv+ραvol(C)),\Delta(C) = \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\,\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right),Δ(C)=2μC1−αdoutdinϕ(C)vol(C)(dvα+ραvol(C)), you want to solve the inequality Δ(C)≤(1−η)ραdout(η∈(0,1)).\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}} \qquad(\eta\in(0,1)).Δ(C)≤(1−η)ραdout(η∈(0,1)). ===== Factor out α\alphaα inside the parentheses and cancel α>0\alpha>0α>0 on both sides: ===== 1−α2μC ϕ(C) vol(C)dout din (1dv+ρvol(C)) ≤ (1−η)ρdout.\frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\,\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\right) \;\le\; (1-\eta)\rho\sqrt{d_{\mathrm{out}}}.2μC1−αdoutdinϕ(C)vol(C)(dv1+ρvol(C))≤(1−η)ρdout. Rearrange to isolate the square root: ϕ(C) vol(C)dout din ≤ 2μC1−α (1−η)ρdout1dv+ρvol(C).\sqrt{\frac{\phi(C)\,\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}} \;\le\; \frac{2\mu_C}{1-\alpha}\, \frac{(1-\eta)\rho\sqrt{d_{\mathrm{out}}}}{\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}}.doutdinϕ(C)vol(C)≤1−α2μCdv1+ρvol(C)(1−η)ρdout. Square both sides and solve for ϕ(C)\phi(C)ϕ(C): ϕ(C) ≤ ϕmax(C):=4μC2(1−η)2ρ2 dout2 din(1−α)2 vol(C) (1dv+ρvol(C))2.\boxed{ \phi(C)\;\le\;\phi_{\max}(C) := \frac{4\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}} {(1-\alpha)^2\,\mathrm{vol}(C)\,\Bigl(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2}. }ϕ(C)≤ϕmax(C):=(1−α)2vol(C)(dv1+ρvol(C))24μC2(1−η)2ρ2dout2din. That is the exact “how small should conductance be?” threshold implied by your condition. ==== The denominator contains (1dv+ρvol(C))2\Bigl(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2(dv1+ρvol(C))2. Depending on which term dominates, you get simpler scaling: ==== ===== If ρvol(C)≪1/dv\rho\sqrt{\mathrm{vol}(C)} \ll 1/\sqrt{d_v}ρvol(C)≪1/dv (equivalently vol(C)≪1/(ρ2dv)\mathrm{vol}(C)\ll 1/(\rho^2 d_v)vol(C)≪1/(ρ2dv)), then ===== (1dv+ρvol(C))2≈1dv,\Bigl(\tfrac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2 \approx \frac{1}{d_v},(dv1+ρvol(C))2≈dv1, so ϕmax(C) ≈ 4μC2(1−η)2ρ2 dout2 din dv(1−α)2 vol(C).\phi_{\max}(C)\;\approx\; \frac{4\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}\,d_v}{(1-\alpha)^2\,\mathrm{vol}(C)}.ϕmax(C)≈(1−α)2vol(C)4μC2(1−η)2ρ2dout2dindv. ===== If ρvol(C)≫1/dv\rho\sqrt{\mathrm{vol}(C)} \gg 1/\sqrt{d_v}ρvol(C)≫1/dv, then ===== (1dv+ρvol(C))2≈ρ2vol(C),\Bigl(\tfrac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2 \approx \rho^2\mathrm{vol}(C),(dv1+ρvol(C))2≈ρ2vol(C), so ϕmax(C) ≈ 4μC2(1−η)2 dout2 din(1−α)2 vol(C)2.\phi_{\max}(C)\;\approx\; \frac{4\mu_C^2(1-\eta)^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}}{(1-\alpha)^2\,\mathrm{vol}(C)^2}.ϕmax(C)≈(1−α)2vol(C)24μC2(1−η)2dout2din. ==== In the corollary regime you were already using (typical local clustering assumptions): ==== * vol(C)≲V0/ρ\mathrm{vol}(C)\lesssim V_0/\rhovol(C)≲V0/ρ, * din,dv≳d0d_{\mathrm{in}},d_v\gtrsim d_0din,dv≳d0 (bounded below), * μC≳μ0\mu_C\gtrsim \mu_0μC≳μ0 (internally well-conditioned), * dout≳c0/(α2ρ)d_{\mathrm{out}}\gtrsim c_0/(\alpha^2\rho)dout≳c0/(α2ρ) (outside degrees large enough so λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi=ραdi is Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ)), then ϕmax(C)\phi_{\max}(C)ϕmax(C) simplifies to a constant multiple of ρ\rhoρ: ϕmax(C) ≳ μ02(1−η)2 c02 d02α4(1−α)2 V0⏟constant depending on parameters⋅ρ,\phi_{\max}(C)\;\gtrsim\; \underbrace{ \frac{\mu_0^2(1-\eta)^2\,c_0^2\,d_0^2}{\alpha^4(1-\alpha)^2\,V_0} }_{\text{constant depending on parameters}} \cdot \rho,ϕmax(C)≳constant depending on parametersα4(1−α)2V0μ02(1−η)2c02d02⋅ρ, (up to the same mild “ρ\rhoρ small enough so the seed term dominates” caveat you used earlier). So in that common regime, a clean sufficient statement is: * Conductance needs to be on the order of ρ\rhoρ (i.e. ϕ(C)≤c ρ\phi(C)\le c\,\rhoϕ(C)≤cρ for a small-enough constant ccc depending on α,μC,dout,din,vol(C),dv,η\alpha,\mu_C,d_{\mathrm{out}},d_{\mathrm{in}},\mathrm{vol}(C),d_v,\etaα,μC,dout,din,vol(C),dv,η). If you want, I can also rewrite ϕmax(C)\phi_{\max}(C)ϕmax(C) in terms of the cut size cut(C,Cˉ)\mathrm{cut}(C,\bar C)cut(C,Cˉ) instead of ϕ(C)\phi(C)ϕ(C), which sometimes reads more directly as a “boundary must be at most ___ edges” condition. === User: Can you convert all this analysis and regimes into a single latex snippet? === Can you convert all this analysis and regimes into a single latex snippet? === Assistant: Here is a single LaTeX snippet drop-in that packages the full “solve for ϕC\phiCϕC” calculation, plus the two regimes a… === Here is a single LaTeX snippet (drop-in) that packages the full “solve for ϕ(C)\phi(C)ϕ(C)” calculation, plus the two regimes and the ϕ(C)=O(ρ)\phi(C)=O(\rho)ϕ(C)=O(ρ) headline under typical local-cluster scalings. It’s written to fit directly after your definition of Δ(C)\Delta(C)Δ(C) (or anywhere in that section). open-problem-fountoulakis22a <syntaxhighlight lang="latex">% ------------------------------------------------------------ % Solving the Delta(C)-condition for conductance, with regimes. % ------------------------------------------------------------ \subsubsection*{How small must $\phi(C)$ be? Solving the $\Delta(C)$ condition} Recall the sufficient slack condition used in the outside-margin bound: \begin{equation}\label{eq:Delta_slack_condition} \Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}, \qquad \eta\in(0,1), \end{equation} where \[ \Delta(C) = \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \] \begin{lemma}[Conductance threshold implied by \eqref{eq:Delta_slack_condition}] \label{lem:phi_threshold_exact} The condition \eqref{eq:Delta_slack_condition} is equivalent to \begin{equation}\label{eq:phi_max_exact} \phi(C)\;\le\;\phi_{\max}(C) :=\frac{4\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}} {(1-\alpha)^2\,\mathrm{vol}(C)\,\Bigl(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2}. \end{equation} \end{lemma} \begin{proof} Start from \eqref{eq:Delta_slack_condition} and substitute the definition of $\Delta(C)$. Cancel the positive factor $\alpha$ on both sides to get \[ \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\right) \le (1-\eta)\rho\sqrt{d_{\mathrm{out}}}. \] Rearrange to isolate the square root: \[ \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}} \le \frac{2\mu_C}{1-\alpha}\cdot \frac{(1-\eta)\rho\sqrt{d_{\mathrm{out}}}}{\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}}. \] Squaring both sides and solving for $\phi(C)$ gives \eqref{eq:phi_max_exact}. \end{proof} \begin{remark}[Two useful regimes]\label{rem:phi_regimes} Let $a:=\frac{1}{\sqrt{d_v}}$ and $b:=\rho\sqrt{\mathrm{vol}(C)}$, so the denominator in \eqref{eq:phi_max_exact} is $(a+b)^2$. Two common regimes give simpler sufficient (but slightly stronger) conditions: \begin{enumerate} \item \textbf{Seed-dominated regime ($b\le a$).} If $\rho\sqrt{\mathrm{vol}(C)}\le 1/\sqrt{d_v}$, then $(a+b)^2\le (2a)^2=4/d_v$. Thus a sufficient condition for \eqref{eq:Delta_slack_condition} is \begin{equation}\label{eq:phi_seed_sufficient} \phi(C)\;\le\; \frac{\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}\,d_v} {(1-\alpha)^2\,\mathrm{vol}(C)}. \end{equation} \item \textbf{Penalty-dominated regime ($a\le b$).} If $1/\sqrt{d_v}\le \rho\sqrt{\mathrm{vol}(C)}$, then $(a+b)^2\le (2b)^2=4\rho^2\mathrm{vol}(C)$. Thus a sufficient condition for \eqref{eq:Delta_slack_condition} is \begin{equation}\label{eq:phi_penalty_sufficient} \phi(C)\;\le\; \frac{\mu_C^2(1-\eta)^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}} {(1-\alpha)^2\,\mathrm{vol}(C)^2}. \end{equation} \end{enumerate} \end{remark} \begin{remark}[Why $\phi(C)=O(\rho)$ in the typical local-cluster scaling]\label{rem:phi_O_rho} A common locality regime assumes \[ \mathrm{vol}(C)\le \frac{V_0}{\rho}, \qquad d_{\mathrm{in}}\ge d_0, \qquad d_v\ge d_0, \qquad \mu_C\ge \mu_0, \qquad d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho}, \] for constants $V_0,d_0,\mu_0,c_0>0$. For small $\rho$, one typically has $\rho\sqrt{\mathrm{vol}(C)}\le \rho\sqrt{V_0/\rho}=\sqrt{V_0\rho}\le 1/\sqrt{d_0}$, so the seed-dominated regime in Remark~\ref{rem:phi_regimes} applies. Then \eqref{eq:phi_seed_sufficient} yields the explicit scaling \[ \phi(C)\;\le\; \frac{\mu_0^2(1-\eta)^2\,c_0^2\,d_0^2}{\alpha^4(1-\alpha)^2\,V_0}\;\rho, \] which is of the form $\phi(C)\le c\,\rho$ with a constant $c$ depending only on $(\mu_0,d_0,c_0,V_0,\alpha,\eta)$. \end{remark} % ------------------------------------------------------------ </syntaxhighlight> If you want, I can also add a one-line corollary right after this snippet that explicitly states: “If ϕ(C)≤ϕmax(C)\phi(C)\le\phi_{\max}(C)ϕ(C)≤ϕmax(C) then Δ(C)≤(1−η)ραdout\Delta(C)\le(1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}Δ(C)≤(1−η)ραdout and hence γCˉ≥ηραdout\gamma_{\bar C}\ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}}γCˉ≥ηραdout” (linking directly back to your theorem). === User: sure, do that === sure, do that === Assistant: Add the following corollary + proof right after Lemmaphithresholdexact or anywhere after you’ve defined ϕmaxC\phi{\max… === Add the following corollary + proof right after Lemma~phi_threshold_exact (or anywhere after you’ve defined ϕmax(C)\phi_{\max}(C)ϕmax(C)); it explicitly closes the loop back to Theorem~\ref{thm:gamma_parameterized_by_cluster} in the ℓ1\ell_1ℓ1-regularized PPR setting. open-problem-fountoulakis22a <syntaxhighlight lang="latex">\begin{corollary}[Conductance condition implies an outside-margin bound]\label{cor:phi_implies_margin} Fix $\eta\in(0,1)$ and define $\phi_{\max}(C)$ as in \eqref{eq:phi_max_exact}. If \[ \phi(C)\le \phi_{\max}(C), \] then $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$. Consequently, the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster} hold, so: \begin{enumerate} \item[(i)] the global minimizer satisfies $x^\star_{\bar C}=0$ (equivalently, $\supp(x^\star)\subseteq C$), and \item[(ii)] the outside margin obeys \[ \gamma_{\bar C}:=\min_{i\in\bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr) \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}. \] \end{enumerate} \end{corollary} \begin{proof} By Lemma~\ref{lem:phi_threshold_exact}, the inequality $\phi(C)\le \phi_{\max}(C)$ is equivalent to $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$. Since $\eta\in(0,1)$, this also implies $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}$. Apply Theorem~\ref{thm:gamma_parameterized_by_cluster} to conclude $x^\star_{\bar C}=0$ and the stated bound on $\gamma_{\bar C}$. \end{proof} </syntaxhighlight>
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)