Jump to content

Openai/693e3ce6-229c-8008-97dc-ab720cb1f95a

From freem
Revision as of 05:33, 19 December 2025 by Lukegao (talk | contribs) (by freem.ai...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


This conversation was exported from ChatGPT[1]

gpt-5-2-pro[edit | edit source]

User: Here is an open problem of mine.[edit | edit source]

Here is an open problem of mine. I want you to tell if it's possible prove it using standard accelerated algorithms. Avoid solutions like that of Rubio which utilizes an expensive subproblem.

Assistant: From your COLT’22 note, the “dream” guarantee is:[edit | edit source]

From your COLT’22 note, the “dream” guarantee is:

  • use a standard accelerated proximal-gradient method (FISTA / linear coupling) on the ℓ1–regularized PPR objective (your eq. (3)),
  • keep the per-iteration work local/sparse (≈ only O(1/ρ)O(1/\rho)O(1/ρ) coordinates touched each iteration, as in the ISTA/PGD analysis),
  • and thereby upgrade the total runtime from O~((αρ)−1)\tilde O((\alpha\rho)^{-1})O~((αρ)−1) to O~((α ρ)−1)\tilde O((\sqrt{\alpha}\,\rho)^{-1})O~((α​ρ)−1). open-problem-fountoulakis22a

You also point out the exact bottleneck: we already know accelerated methods reduce iteration complexity to O(1/αlog⁡(1/ε))O(\sqrt{1/\alpha}\log(1/\varepsilon))O(1/α​log(1/ε)), but we don’t have a worst‑case upper bound on how many coordinates get activated/updated per iteration, so the only safe bound is the pessimistic O(n1/αlog⁡(1/ε))O(n\sqrt{1/\alpha}\log(1/\varepsilon))O(n1/α​log(1/ε)). open-problem-fountoulakis22a

If you mean a theorem of the form[edit | edit source]

then my assessment is: very unlikely without adding extra safeguards or extra assumptions.

The ISTA/PGD sparsity proof in this setting hinges on very special monotonicity/order properties of the non‑accelerated proximal-gradient map for these PageRank/M‑matrix quadratics (this is what lets you argue you never need to “touch” most coordinates). Your note explicitly says that analysis “does not apply to the accelerated version.” open-problem-fountoulakis22a[edit | edit source]

Acceleration introduces an extrapolated point (schematically yk=xk+βk(xk−xk−1)y_k=x_k+\beta_k(x_k-x_{k-1})yk​=xk​+βk​(xk​−xk−1​)), which breaks exactly the kind of order-preserving / “push”-style monotonicity that local PageRank proofs exploit. Once you lose that, it becomes extremely difficult to rule out “spurious activations” of many coordinates in a single iteration, because the prox step can turn on any coordinate whose pre-prox value crosses the threshold.

There’s also a broader, well-known phenomenon in composite/proximal methods:

  • Finite-time support (manifold) identification is typically provable only under a nondegeneracy / strict complementarity margin at the solution.
  • Without such a margin (degenerate cases), the active set can be “almost active” for many coordinates, and accelerated methods can oscillate around the threshold.

This is not specific to PageRank; it’s a general limitation noted in the manifold-identification literature. Optimization Online[2]

That matters here because your open problem is explicitly worst-case (“it is not even clear if acceleration would not lead to a worse running time complexity in the worst case”). open-problem-fountoulakis22a

Worst-case families can easily include symmetric/degenerate graphs where many coordinates sit exactly on (or arbitrarily close to) the activation boundary.

The Martínez‑Rubio–Wirth–Pokutta COLT’23 paper essentially answers your question in the affirmative by changing the algorithmic object: they build an active-set accelerated method that maintains the invariants that vanilla acceleration breaks.[edit | edit source]

Concretely, their Algorithm 4 (ASPR) repeatedly:

  1. defines the current active set S(t)S^{(t)}S(t) from negative gradient coordinates,
  2. runs an accelerated projected GD subroutine (APGD) on the restricted subspace C(t)=span{ei:i∈S(t)}∩R≥0nC^{(t)}=\mathrm{span}\{e_i:i\in S^{(t)}\}\cap\mathbb{R}^n_{\ge 0}C(t)=span{ei​:i∈S(t)}∩R≥0n​,
  3. applies a small “shrink” step x(t+1)←max⁡{0,xˉ(t+1)−δt1}x^{(t+1)}\leftarrow \max\{0,\bar x^{(t+1)}-\delta_t \mathbf{1}\}x(t+1)←max{0,xˉ(t+1)−δt​1},
  4. and expands S(t)S^{(t)}S(t) again. Proceedings of Machine Learning Research[3]

That APGD call is exactly the “expensive subproblem” you’re referring to. It’s not “vanilla FISTA on the original full problem”; it’s accelerated optimization on an evolving restricted space, with additional machinery to guarantee S(t)⊆S\S^{(t)}\subseteq S^\S(t)⊆S\ and x(t)≤x\x^{(t)}\le x^\x(t)≤x\ coordinatewise. They explicitly highlight that this is what makes provable sparse updates possible and that this answers your COLT’22 open question. Proceedings of Machine Learning Research[4]

So the existence of Rubio’s solution is also evidence that the missing ingredient is not a standard acceleration rate proof—it’s the active-set control that keeps iterates from “lighting up” irrelevant coordinates.

If you relax your goal slightly, there is a more “standard” line of argument you could potentially formalize:[edit | edit source]

Assume a margin at the optimum that separates active from inactive coordinates (a strict complementarity / nondegeneracy condition). In ℓ1 language this is the usual requirement that, for inactive coordinates, the optimality condition is strict (not tight).[edit | edit source]

Then the standard identification results imply that after finitely many iterations the algorithm identifies the correct manifold/support, after which the method behaves like an accelerated method on a fixed low-dimensional subspace (size ∣S\∣|S^\|∣S\*∣, which is ≈ O(1/ρ)O(1/\rho)O(1/ρ) in your application regime). The catch is:

  • the identification time bound typically depends on the margin parameter (often called something like δmin⁡\delta_{\min}δmin​), and in degenerate instances δmin⁡\delta_{\min}δmin​ can be arbitrarily small or zero. Optimization Online[5]
  • so you don’t get a clean worst-case O~((αρ)−1)\tilde O((\sqrt{\alpha}\rho)^{-1})O~((α​ρ)−1) bound purely in terms of α,ρ\alpha,\rhoα,ρ.

In other words, you can likely get something like

time  ≲  1ρ polylog ⁣(1δmin⁡)⏟identification phase  +  1ραlog⁡ ⁣(1ε)⏟accelerated phase on S\*,\text{time} \;\lesssim\; \underbrace{\frac{1}{\rho}\,\mathrm{polylog}\!\Big(\frac{1}{\delta_{\min}}\Big)}_{\text{identification phase}} \;+\; \underbrace{\frac{1}{\rho\sqrt{\alpha}}\log\!\Big(\frac{1}{\varepsilon}\Big)}_{\text{accelerated phase on }S^\},time≲identification phaseρ1​polylog(δmin​1​)​​+accelerated phase on S\ρα​1​log(ε1​)​​, but the δmin⁡\delta_{\min}δmin​-dependence is the part that prevents a clean worst-case theorem.

- If you want a worst‑case guarantee over all graphs/instances of the kind stated in your open problem—i.e., a universal δ\deltaδ such that plain FISTA/linear coupling updates ≤δ/ρ\le \delta/\rho≤δ/ρ coordinates each iteration—then I don’t see a path via “standard acceleration analysis” alone, because the whole difficulty is controlling the extrapolated iterates’ active set, and generic proximal identification theory tells you you cannot control that uniformly without a nondegeneracy margin.[edit | edit source]

  • The only provable acceleration-with-sparsity result I’m aware of in this line is essentially active-set acceleration (Rubio et al.), which introduces exactly the kind of “subproblem” / restricted-space solves you want to avoid. Proceedings of Machine Learning Research[6]
The most plausible compromise (theorem-wise) is:[edit | edit source]
  1. prove a finite identification result for your PageRank instance class (needs a margin condition or a probabilistic model), then
  2. run a single accelerated method on the identified support.

This avoids repeated expensive restricted solves, but it won’t give a clean worst-case O~((αρ)−1)\tilde O((\sqrt{\alpha}\rho)^{-1})O~((α​ρ)−1) unless you can lower-bound the margin in terms of ρ\rhoρ for all graphs—which is exactly the kind of statement that seems false in worst case (symmetries/degeneracy).

If you tell me which “standard accelerated algorithm” you have in mind (plain FISTA, strongly-convex APG with fixed β\betaβ, linear coupling variant, etc.), I can be more concrete about which invariants one would need to establish (e.g., order-bounds 0≤yk≤c x\0\le y_k\le c\,x^\0≤yk​≤cx\*, or a per-iteration certificate that prevents activation outside a small frontier), and where the known theory breaks.

User: Provide a self-contained proof which uses the idea of the margin parameter.[edit | edit source]

Provide a self-contained proof which uses the idea of the margin parameter. Provide a self-contained latex document.

Assistant: Below is a self-contained LaTeX note that proves a finite active-set identification result for standard accelerated pro…[edit | edit source]

Below is a self-contained LaTeX note that proves a finite active-set identification result for standard accelerated proximal gradient (APG / FISTA-style momentum) on the ℓ1\ell_1ℓ1​-regularized PageRank objective (3) from your COLT’22 open problem note open-problem-fountoulakis22a

, under a strict-complementarity “margin” assumption. The proof shows that once the accelerated iterates are close enough to the optimum (with a radius proportional to the margin), no coordinates outside the optimal support can ever be activated again; it also gives an explicit bound on the identification time in terms of the margin and the accelerated convergence rate.

\documentclass[11pt]{article}

\usepackage{amsmath,amssymb,amsthm,mathtools}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}

% --------------------
% Theorem environments
% --------------------
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}
\newtheorem{corollary}{Corollary}
\newtheorem{assumption}{Assumption}
\theoremstyle{remark}
\newtheorem{remark}{Remark}

% --------------------
% Macros
% --------------------
\newcommand{\R}{\mathbb{R}}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\ip}[2]{\left\langle #1,#2 \right\rangle}
\newcommand{\prox}{\operatorname{prox}}
\newcommand{\supp}{\operatorname{supp}}
\newcommand{\diag}{\operatorname{diag}}

\title{Finite Active-Set Identification for Accelerated Proximal Gradient on $\ell_1$-Regularized PageRank Under a Margin Condition}
\author{}
\date{}

\begin{document}
\maketitle

\begin{abstract}
We consider the $\ell_1$-regularized PageRank objective
\[
\min_{x\in\R^n}\; F(x) \;:=\; f(x) + g(x),
\qquad
g(x) := \rho\alpha \norm{D^{1/2}x}_1,
\]
where $f$ is the smooth quadratic arising from personalized PageRank.
We prove a \emph{finite identification} statement for standard accelerated proximal-gradient methods:
under a strict-complementarity \emph{margin} condition at the unique minimizer $x^\star$,
all coordinates outside $\supp(x^\star)$ become and remain identically zero after finitely many iterations.
An explicit identification-time bound is given in terms of the margin and the accelerated convergence rate.
\end{abstract}

\section{Problem setup: $\ell_1$-regularized PageRank as a composite strongly convex problem}

Let $G=(V,E)$ be an undirected graph with $n=|V|$ nodes.
Let $A\in\R^{n\times n}$ denote the adjacency matrix and $D=\diag(d_1,\dots,d_n)$ the degree matrix.
Fix $\alpha\in(0,1]$ and a seed vector $s\in\R^n$ (e.g.\ an indicator of a seed node).

Define the (symmetric) normalized Laplacian
\[
L := I - D^{-1/2} A D^{-1/2}.
\]
It is standard that $L$ is positive semidefinite and $\lambda(L)\subseteq[0,2]$ for undirected graphs.

Following the $\ell_1$-regularized PageRank formulation, define the matrix
\[
Q := \alpha I + \frac{1-\alpha}{2}L,
\]
and the smooth function
\begin{equation}\label{eq:f-def}
f(x) := \frac12 x^\top Q x - \alpha\, x^\top D^{-1/2}s.
\end{equation}
Let the nonsmooth regularizer be
\begin{equation}\label{eq:g-def}
g(x) := \rho\alpha\, \norm{D^{1/2}x}_1
      = \sum_{i=1}^n \lambda_i |x_i|,
\qquad
\lambda_i := \rho\alpha\sqrt{d_i}.
\end{equation}
We study
\begin{equation}\label{eq:main-problem}
\min_{x\in\R^n} F(x) := f(x)+g(x).
\end{equation}

\subsection{Smoothness and strong convexity constants for $f$ and $F$}

Since $\nabla^2 f(x)=Q$ is constant, we can read smoothness/strong convexity from the spectrum of $Q$.
Because $\lambda(L)\subseteq[0,2]$, we have
\[
\lambda(Q) = \left\{\alpha + \frac{1-\alpha}{2}\lambda: \lambda\in\lambda(L)\right\}
\subseteq [\alpha,\, \alpha+(1-\alpha)] = [\alpha,1].
\]
Therefore:
\begin{itemize}
\item $\nabla f$ is $L_f$-Lipschitz with $L_f=\norm{Q}_2\le 1$.
\item $f$ is $\mu$-strongly convex with $\mu=\lambda_{\min}(Q)\ge \alpha$.
\end{itemize}
In what follows we write $L:=L_f$ and $\mu:=\alpha$ for simplicity; the argument works for general $L,\mu$.

Since $g$ is convex, the sum $F=f+g$ is also $\mu$-strongly convex. Hence $F$ has a unique minimizer
$x^\star$.

\section{Prox operator and optimality conditions}

\subsection{Weighted soft-thresholding proximal operator}

Because $g(x)=\sum_i \lambda_i|x_i|$ is separable, its proximal operator is coordinatewise.

\begin{lemma}[Weighted soft-thresholding]\label{lem:prox}
Let $t>0$. For any $u\in\R^n$,
\[
\prox_{t g}(u)_i \;=\; \operatorname{sign}(u_i)\,\max\{ |u_i| - t\lambda_i,\; 0\},
\qquad i=1,\dots,n.
\]
In particular, $\prox_{t g}(u)_i=0$ if and only if $|u_i|\le t\lambda_i$.
\end{lemma}

\begin{proof}
This is standard: minimize $\frac{1}{2t}\norm{x-u}_2^2 + \sum_i \lambda_i|x_i|$,
which separates over coordinates.
For each coordinate one obtains the soft-thresholding solution with threshold $t\lambda_i$.
\end{proof}

\subsection{KKT conditions and the active/inactive sets}

Let $x^\star$ denote the unique minimizer of \eqref{eq:main-problem}. The first-order optimality condition is
\[
0 \in \nabla f(x^\star) + \partial g(x^\star).
\]
Coordinatewise, since $\partial|x_i|=\{\operatorname{sign}(x_i)\}$ for $x_i\neq 0$
and $\partial|0|=[-1,1]$, we obtain:

\begin{lemma}[Coordinatewise optimality conditions]\label{lem:kkt}
For each $i=1,\dots,n$:
\begin{itemize}
\item If $x_i^\star\neq 0$, then $\nabla f(x^\star)_i + \lambda_i\,\operatorname{sign}(x_i^\star)=0$.
Equivalently, $|\nabla f(x^\star)_i|=\lambda_i$.
\item If $x_i^\star=0$, then $|\nabla f(x^\star)_i|\le \lambda_i$.
\end{itemize}
\end{lemma}

Define the \emph{active set} $S^\star := \supp(x^\star)=\{i:x_i^\star\neq 0\}$ and the \emph{inactive set}
$I^\star := \{1,\dots,n\}\setminus S^\star = \{i:x_i^\star=0\}$.

\section{A margin (strict complementarity) assumption}

The key idea is to assume that the inactive coordinates satisfy \emph{strict} inequality in the KKT condition.

\begin{assumption}[Margin / strict complementarity]\label{ass:margin}
Define the margin parameter
\begin{equation}\label{eq:margin}
\gamma \;:=\; \min_{i\in I^\star}\bigl(\lambda_i - |\nabla f(x^\star)_i|\bigr).
\end{equation}
Assume $\gamma>0$.
Equivalently, for every inactive coordinate $i\in I^\star$,
\[
|\nabla f(x^\star)_i|\le \lambda_i - \gamma.
\]
\end{assumption}

\begin{remark}
This assumption is standard in manifold-identification results for $\ell_1$-type regularizers.
It may fail (i.e.\ $\gamma=0$) in degenerate instances; without a positive margin, one cannot in general
guarantee finite-time identification.
\end{remark}

\section{Accelerated proximal-gradient method}

Fix a step size $t\in(0,1/L]$.
Consider the standard accelerated proximal-gradient iteration (Nesterov / FISTA-style momentum):
\begin{equation}\label{eq:apg}
\begin{aligned}
y_k &= x_k + \beta_k (x_k - x_{k-1}),\\
x_{k+1} &= \prox_{t g}\bigl( y_k - t \nabla f(y_k)\bigr),
\end{aligned}
\qquad k\ge 0,
\end{equation}
with initialization $x_{-1}=x_0$ (so $y_0=x_0$).
For strongly convex objectives one often uses a constant $\beta_k\equiv\beta\in[0,1)$,
e.g.\ $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$, but the identification argument below
only needs that $y_k\to x^\star$ and (for explicit bounds) a known convergence rate.

\section{Key stability lemma: the margin prevents activation of inactive coordinates}

\begin{lemma}[Inactivity is stable in a neighborhood]\label{lem:stable-zero}
Assume $f$ has $L$-Lipschitz gradient in $\ell_2$, and Assumption~\ref{ass:margin} holds.
Let $t\in(0,1/L]$ and define the radius
\begin{equation}\label{eq:radius}
r \;:=\; \frac{t\gamma}{1+tL}.
\end{equation}
Then for any $y\in\R^n$ satisfying $\norm{y-x^\star}_2 \le r$, the proximal-gradient update
\[
x^+ := \prox_{t g}\bigl(y - t\nabla f(y)\bigr)
\]
satisfies $x^+_i = 0$ for all $i\in I^\star$.
\end{lemma}

\begin{proof}
Fix an inactive coordinate $i\in I^\star$, so $x_i^\star=0$.
Let
\[
u := y - t\nabla f(y).
\]
By Lemma~\ref{lem:prox}, it suffices to show $|u_i|\le t\lambda_i$.
Write
\[
u_i = (y_i-x_i^\star) - t\bigl(\nabla f(y)_i - \nabla f(x^\star)_i\bigr) \;-\; t\nabla f(x^\star)_i.
\]
Using the triangle inequality and $x_i^\star=0$,
\begin{align*}
|u_i|
&\le |y_i-x_i^\star| + t|\nabla f(y)_i - \nabla f(x^\star)_i| + t|\nabla f(x^\star)_i| \\
&\le \norm{y-x^\star}_2 + t\norm{\nabla f(y)-\nabla f(x^\star)}_2 + t|\nabla f(x^\star)_i|.
\end{align*}
Since $\nabla f$ is $L$-Lipschitz in $\ell_2$,
$\norm{\nabla f(y)-\nabla f(x^\star)}_2 \le L\norm{y-x^\star}_2$, hence
\[
|u_i|\le (1+tL)\norm{y-x^\star}_2 + t|\nabla f(x^\star)_i|.
\]
By the margin assumption, $|\nabla f(x^\star)_i|\le \lambda_i - \gamma$, so
\[
|u_i|\le (1+tL)\norm{y-x^\star}_2 + t(\lambda_i-\gamma).
\]
If $\norm{y-x^\star}_2\le r=\frac{t\gamma}{1+tL}$, then
\[
|u_i|\le (1+tL)\cdot \frac{t\gamma}{1+tL} + t(\lambda_i-\gamma) = t\lambda_i.
\]
Therefore $|u_i|\le t\lambda_i$, and Lemma~\ref{lem:prox} gives $x^+_i=0$.
Since $i\in I^\star$ was arbitrary, $x^+_{I^\star}=0$.
\end{proof}

\section{Main result: finite identification for accelerated proximal gradient}

\subsection{Existence of a finite identification time}

\begin{theorem}[Finite active-set identification (qualitative)]\label{thm:finite-id-qual}
Assume Assumption~\ref{ass:margin} (i.e.\ $\gamma>0$).
Consider the accelerated proximal-gradient iteration \eqref{eq:apg} with any step $t\in(0,1/L]$.
If $y_k\to x^\star$ in $\ell_2$, then there exists a finite index $K$ such that for all $k\ge K$,
\[
x_{k+1,i}=0 \quad \text{for every } i\in I^\star.
\]
Equivalently, after finitely many iterations, all iterates satisfy
$\supp(x_k)\subseteq S^\star$.
\end{theorem}

\begin{proof}
By Lemma~\ref{lem:stable-zero}, any iterate $y$ in the ball
$\{y:\norm{y-x^\star}_2\le r\}$ (with $r$ as in \eqref{eq:radius}) produces a proximal-gradient
update $x^+$ with $x^+_{I^\star}=0$.
Since $y_k\to x^\star$, there exists $K$ such that $\norm{y_k-x^\star}_2\le r$ for all $k\ge K$.
Applying Lemma~\ref{lem:stable-zero} at each such $k$ yields $x_{k+1,I^\star}=0$ for all $k\ge K$.
\end{proof}

\subsection{An explicit identification-time bound using a standard accelerated rate}

To make Theorem~\ref{thm:finite-id-qual} quantitative, we combine it with a standard
accelerated convergence rate for strongly convex composite minimization.
We state the rate as an assumption/lemma; this is ``standard accelerated algorithms'' content.

\begin{lemma}[Standard accelerated rate (strongly convex composite)]\label{lem:apg-rate}
Assume $F=f+g$ is $\mu$-strongly convex and $\nabla f$ is $L$-Lipschitz.
Consider an accelerated proximal-gradient scheme \eqref{eq:apg} with step $t=1/L$ and a
strongly-convex momentum choice (e.g.\ constant $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$).
Then there exist constants $C>0$ and $q\in(0,1)$ such that
\begin{equation}\label{eq:rate-gap}
F(x_k)-F(x^\star) \le C q^k \qquad \text{for all }k\ge 0,
\end{equation}
where one may take $q=1-\sqrt{\mu/L}$ (up to standard constant-factor variants).
\end{lemma}

\begin{remark}
Lemma~\ref{lem:apg-rate} is a standard result for accelerated proximal-gradient methods on
$\mu$-strongly convex composite objectives.
It is proved, for example, via estimate sequences or by analyzing a suitable Lyapunov/potential function.
The precise constant $C$ is not important for the identification argument; only geometric decay is used.
\end{remark}

\begin{theorem}[Identification time bound]\label{thm:finite-id-quant}
Assume Assumption~\ref{ass:margin} holds with margin $\gamma>0$,
and suppose the accelerated method satisfies Lemma~\ref{lem:apg-rate} with parameters $(C,q)$
and step $t=1/L$.
Let $\mu$ be the strong convexity parameter of $F$.
Define
\[
r := \frac{t\gamma}{1+tL} = \frac{\gamma}{2L}.
\]
Then for every $k\ge 1$,
\[
\norm{y_k-x^\star}_2 \le
\sqrt{\frac{2C}{\mu}}\; q^{\frac{k-1}{2}}
\left( (1+\beta)\sqrt{q} + \beta \right),
\]
and therefore it suffices that
\begin{equation}\label{eq:K-bound}
k \;\ge\;
1 + \frac{2\log\!\left(\displaystyle
\sqrt{\frac{2C}{\mu}}\,
\frac{( (1+\beta)\sqrt{q} + \beta )}{r}
\right)}{\log(1/q^{-1})}
\end{equation}
to guarantee $x_{k+1,i}=0$ for all $i\in I^\star$.
In particular, the identification time is $O\!\left(\sqrt{\frac{L}{\mu}}\log\frac{1}{\gamma}\right)$
up to problem-dependent constants.
\end{theorem}

\begin{proof}
From $\mu$-strong convexity of $F$ we have, for all $x$,
\[
F(x)-F(x^\star) \ge \frac{\mu}{2}\norm{x-x^\star}_2^2,
\]
hence \eqref{eq:rate-gap} implies
\[
\norm{x_k-x^\star}_2 \le \sqrt{\frac{2C}{\mu}}\, q^{k/2}.
\]
Now bound $y_k$ using the definition $y_k=x_k+\beta(x_k-x_{k-1})$:
\begin{align*}
\norm{y_k-x^\star}_2
&= \norm{(1+\beta)(x_k-x^\star) - \beta(x_{k-1}-x^\star)}_2 \\
&\le (1+\beta)\norm{x_k-x^\star}_2 + \beta\norm{x_{k-1}-x^\star}_2 \\
&\le \sqrt{\frac{2C}{\mu}}\left( (1+\beta)q^{k/2} + \beta q^{(k-1)/2}\right) \\
&= \sqrt{\frac{2C}{\mu}}\; q^{(k-1)/2}
\left( (1+\beta)\sqrt{q} + \beta\right).
\end{align*}
If the right-hand side is at most $r$, then Lemma~\ref{lem:stable-zero} applies and gives
$x_{k+1,I^\star}=0$. Solving for $k$ yields \eqref{eq:K-bound}.
The stated big-$O$ form follows from the standard choice $q\approx 1-\sqrt{\mu/L}$,
for which $\log(1/q^{-1}) \asymp \sqrt{\mu/L}$.
\end{proof}

\section{Specialization to $\ell_1$-regularized PageRank}

For the PageRank objective \eqref{eq:f-def}--\eqref{eq:g-def} we have $L\le 1$ and $\mu=\alpha$.
Taking $t=1$ (valid since $L\le 1$), the stability radius in Lemma~\ref{lem:stable-zero} becomes
\[
r = \frac{\gamma}{1+L}\ge \frac{\gamma}{2}.
\]
Thus, once an accelerated method produces extrapolated points $y_k$ within distance $\gamma/2$ of $x^\star$,
the inactive coordinates $I^\star=\{i:x_i^\star=0\}$ are never activated again.

\begin{remark}[What this does and does not prove]
Theorem~\ref{thm:finite-id-qual}--\ref{thm:finite-id-quant} provide a clean way to control
\emph{eventual} sparsity using a \emph{margin parameter} $\gamma$.
However:
\begin{itemize}
\item The margin $\gamma$ can be arbitrarily small (or zero) in worst-case instances, so this does not by itself
yield a worst-case runtime bound depending only on $(\alpha,\rho)$.
\item The results do not bound how many coordinates might become active during an initial transient phase
(before the iterates enter the $\gamma$-neighborhood).
\end{itemize}
Nonetheless, under a positive margin, the accelerated method identifies the optimal zero pattern in finite time.
\end{remark}

\section*{Acknowledgment of standard ingredients}
The only nontrivial ingredients beyond basic convex analysis are:
(i) the proximal map of the weighted $\ell_1$ norm (Lemma~\ref{lem:prox}),
(ii) Lipschitz continuity of $\nabla f$, and
(iii) a standard accelerated linear convergence rate for strongly convex composite minimization (Lemma~\ref{lem:apg-rate}).
The identification argument itself (Lemma~\ref{lem:stable-zero} and Theorems~\ref{thm:finite-id-qual}--\ref{thm:finite-id-quant})
is self-contained.

\end{document}

User: Wait, you say that "\item The results do not bound how many coordinates might become active during an initial transient…[edit | edit source]

Wait, you say that "\item The results do not bound how many coordinates might become active during an initial transient phase (before the iterates enter the $\gamma$-neighborhood)."

But this is very important... Is it possible to bound this?

Assistant: You’re right to flag that: for the local running time story in your open problem, what matters is not just that the “wr…[edit | edit source]

You’re right to flag that: for the local running time story in your open problem, what matters is not just that the “wrong” coordinates eventually die, but how big the support can get before identification. open-problem-fountoulakis22a

There are two different answers depending on what kind of bound you mean:

  1. A bound purely from the margin γ\gammaγ, independent of how far you are from x⋆x^\starx⋆
  2. A bound that uses γ\gammaγ and a measure of how far yky_kyk​ is from x⋆x^\starx⋆ (distance or objective gap)
Not in the strong sense you would want (e.g., “≤c/ρ\le c/\rho≤c/ρ active coordinates for all kkk”).[edit | edit source]

Reason: the margin is a local property at the solution:

∣∇f(x⋆)i∣≤λi−γ(i∈I⋆),|\nabla f(x^\star)_i| \le \lambda_i - \gamma\qquad (i\in I^\star),∣∇f(x⋆)i​∣≤λi​−γ(i∈I⋆), so it only says those coordinates are safely inside the “dead zone” near x⋆x^\starx⋆. Far from x⋆x^\starx⋆, nothing stops the prox input from crossing the threshold on many coordinates.

Also, the literature on identification for inertial/accelerated forward–backward methods explicitly discusses that FISTA/inertial schemes can oscillate around the active manifold before settling, so one should not expect a clean monotone “active set only grows nicely” picture without extra structure. arXiv[7]

So: a uniform, instance-independent “small support for all transient iterates” bound does not follow from γ>0\gamma>0γ>0 alone.

Even though γ\gammaγ is local, it does give a very clean quantitative bound of the form[edit | edit source]

This is already useful because accelerated methods give you an explicit decay of ∥yk−x⋆∥\|y_k-x^\star\|∥yk​−x⋆∥ (or of the objective gap), so you can turn it into a bound on transient “wrong support” size that shrinks over time.

Consider the composite problem[edit | edit source]

min⁡x  F(x)=f(x)+g(x),g(x)=∑i=1nλi∣xi∣,\min_x\;F(x)=f(x)+g(x),\qquad g(x)=\sum_{i=1}^n \lambda_i |x_i|,xmin​F(x)=f(x)+g(x),g(x)=i=1∑n​λi​∣xi​∣, with ∇f\nabla f∇f LLL-Lipschitz (in ℓ2\ell_2ℓ2​). Let x⋆x^\starx⋆ be the minimizer and define the inactive set I⋆={i:xi⋆=0}I^\star=\{i:x_i^\star=0\}I⋆={i:xi⋆​=0}. Assume the margin

γ:=min⁡i∈I⋆(λi−∣∇f(x⋆)i∣)>0.\gamma:=\min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr)>0.γ:=i∈I⋆min​(λi​−∣∇f(x⋆)i​∣)>0. Fix a step t>0t>0t>0, and for any point yyy define the prox-gradient input

u(y):=y−t∇f(y),x+(y):=prox⁡tg(u(y)).u(y):=y-t\nabla f(y),\qquad x^+(y):=\operatorname{prox}_{t g}(u(y)).u(y):=y−t∇f(y),x+(y):=proxtg​(u(y)). Let the “spurious active set” at yyy be

A(y):={i∈I⋆: x+(y)i≠0}.A(y):=\{i\in I^\star:\ x^+(y)_i\neq 0\}.A(y):={i∈I⋆: x+(y)i​=0}. Then

∣A(y)∣ ≤ ((1+tL) ∥y−x⋆∥2tγ)2.|A(y)|\ \le\ \left(\frac{(1+tL)\,\|y-x^\star\|_2}{t\gamma}\right)^2.∣A(y)∣ ≤ (tγ(1+tL)∥y−x⋆∥2​​)2.

For weighted soft-thresholding, x+(y)i≠0x^+(y)_i\neq 0x+(y)i​=0 iff ∣u(y)i∣>tλi|u(y)_i|>t\lambda_i∣u(y)i​∣>tλi​.[edit | edit source]

For i∈I⋆i\in I^\stari∈I⋆, we have xi⋆=0x^\star_i=0xi⋆​=0 and ∣∇f(x⋆)i∣≤λi−γ|\nabla f(x^\star)_i|\le \lambda_i-\gamma∣∇f(x⋆)i​∣≤λi​−γ, so

∣u(x⋆)i∣=∣xi⋆−t∇f(x⋆)i∣=t∣∇f(x⋆)i∣≤t(λi−γ).|u(x^\star)_i|=|x^\star_i-t\nabla f(x^\star)_i|=t|\nabla f(x^\star)_i|\le t(\lambda_i-\gamma).∣u(x⋆)i​∣=∣xi⋆​−t∇f(x⋆)i​∣=t∣∇f(x⋆)i​∣≤t(λi​−γ). Hence, if i∈A(y)i\in A(y)i∈A(y), then ∣u(y)i∣>tλi|u(y)_i|>t\lambda_i∣u(y)i​∣>tλi​, and therefore

∣u(y)i∣−∣u(x⋆)i∣>tλi−t(λi−γ)=tγ.|u(y)_i|-|u(x^\star)_i|>t\lambda_i - t(\lambda_i-\gamma)=t\gamma.∣u(y)i​∣−∣u(x⋆)i​∣>tλi​−t(λi​−γ)=tγ. Using ∣ ∣a∣−∣b∣ ∣≤∣a−b∣|\,|a|-|b|\,|\le |a-b|∣∣a∣−∣b∣∣≤∣a−b∣, we get ∣u(y)i−u(x⋆)i∣>tγ|u(y)_i-u(x^\star)_i|>t\gamma∣u(y)i​−u(x⋆)i​∣>tγ.

Thus

∥u(y)−u(x⋆)∥22 ≥ ∑i∈A(y)∣u(y)i−u(x⋆)i∣2 > ∣A(y)∣ (tγ)2.\|u(y)-u(x^\star)\|_2^2\ \ge\ \sum_{i\in A(y)} |u(y)_i-u(x^\star)_i|^2\ >\ |A(y)|\,(t\gamma)^2.∥u(y)−u(x⋆)∥22​ ≥ i∈A(y)∑​∣u(y)i​−u(x⋆)i​∣2 > ∣A(y)∣(tγ)2. On the other hand, since ∇f\nabla f∇f is LLL-Lipschitz,

∥u(y)−u(x⋆)∥2=∥(y−x⋆)−t(∇f(y)−∇f(x⋆))∥2≤(1+tL)∥y−x⋆∥2.\|u(y)-u(x^\star)\|_2 =\|(y-x^\star)-t(\nabla f(y)-\nabla f(x^\star))\|_2 \le (1+tL)\|y-x^\star\|_2.∥u(y)−u(x⋆)∥2​=∥(y−x⋆)−t(∇f(y)−∇f(x⋆))∥2​≤(1+tL)∥y−x⋆∥2​. Combine the two inequalities and rearrange to obtain the claim. ∎

In an accelerated prox-gradient method, you compute[edit | edit source]

xk+1=prox⁡tg(yk−t∇f(yk)).x_{k+1}=\operatorname{prox}_{t g}(y_k-t\nabla f(y_k)).xk+1​=proxtg​(yk​−t∇f(yk​)). Apply the lemma with y=yky=y_ky=yk​:

∣A(yk)∣ ≤ ((1+tL) ∥yk−x⋆∥2tγ)2.|A(y_k)|\ \le\ \left(\frac{(1+tL)\,\|y_k-x^\star\|_2}{t\gamma}\right)^2.∣A(yk​)∣ ≤ (tγ(1+tL)∥yk​−x⋆∥2​​)2. So you can bound the number of “wrong” active coordinates at iteration kkk as soon as you have any bound on ∥yk−x⋆∥2\|y_k-x^\star\|_2∥yk​−x⋆∥2​ (or on an objective gap, using strong convexity).

And the identification literature you’re implicitly leaning on (active-set / manifold identification) does exactly this: it bounds the “active set complexity” in terms of a margin-like parameter (often called δmin⁡\delta_{\min}δmin​) and a convergence bound εx(k)\varepsilon_x(k)εx​(k). Proceedings of Machine Learning Research[8]

Not by itself.[edit | edit source]

  • The bound is quadratic in 1/γ1/\gamma1/γ, not linear in 1/ρ1/\rho1/ρ.
  • To turn it into a “local algorithm” guarantee, you would need to relate γ\gammaγ to ρ\rhoρ (and control ∥yk−x⋆∥\|y_k-x^\star\|∥yk​−x⋆∥ by something that doesn’t hide an nnn-dependence). In worst case, γ\gammaγ can be arbitrarily small even when ∣S⋆∣|S^\star|∣S⋆∣ is O(1/ρ)O(1/\rho)O(1/ρ), so the lemma allows very large transient supports.

So the best honest summary is:

  • Yes, you can bound transient spurious activations using a margin parameter, but the natural bound you can prove from standard smoothness + margin is of the form ∣Ak∣≲∥yk−x⋆∥2/γ2,|A_k| \lesssim \|y_k-x^\star\|^2/\gamma^2,∣Ak​∣≲∥yk​−x⋆∥2/γ2, which is not the O~(1/ρ)\tilde O(1/\rho)O~(1/ρ) sparsity you ultimately want.
  • Getting a uniform O~(1/ρ)\tilde O(1/\rho)O~(1/ρ) per-iteration support bound for acceleration typically needs extra invariants/controls (active-set control, monotonicity/restarter tricks, etc.), which is exactly where the “nonstandard” machinery comes in—consistent with why the identification papers emphasize oscillations/inertial effects. arXiv[9]

User: I want a bound on the sum of supports over the transient.[edit | edit source]

I want a bound on the sum of supports over the transient. You can assume that the initial solution is the zero vector. Provide self-contained latex code.

Assistant: Below is a self-contained LaTeX document that proves a bound on the sum of iterate supports over the transient pre-iden…[edit | edit source]

Below is a self-contained LaTeX document that proves a bound on the sum of iterate supports over the transient (pre-identification) phase, using a margin/strict-complementarity parameter for the ℓ1\ell_1ℓ1​-regularized PageRank objective (3) from your open-problem note. open-problem-fountoulakis22a

\documentclass[11pt]{article}

\usepackage{amsmath,amssymb,amsthm,mathtools}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}

% --------------------
% Theorem environments
% --------------------
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}
\newtheorem{corollary}{Corollary}
\newtheorem{assumption}{Assumption}
\theoremstyle{remark}
\newtheorem{remark}{Remark}

% --------------------
% Macros
% --------------------
\newcommand{\R}{\mathbb{R}}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\ip}[2]{\left\langle #1,#2 \right\rangle}
\newcommand{\prox}{\operatorname{prox}}
\newcommand{\supp}{\operatorname{supp}}
\newcommand{\diag}{\operatorname{diag}}

\title{Bounding the Sum of Supports in the Transient of Accelerated Proximal Gradient via a Margin Parameter}
\author{}
\date{}

\begin{document}
\maketitle

\begin{abstract}
We study accelerated proximal-gradient methods for composite objectives of the form
$F(x)=f(x)+\sum_{i=1}^n \lambda_i |x_i|$ where $f$ is smooth and strongly convex.
Under a strict-complementarity \emph{margin} condition at the minimizer $x^\star$, we bound:
(i) the number of \emph{spurious} active coordinates (outside $\supp(x^\star)$) at each iteration in terms
of $\|y_k-x^\star\|_2$, and
(ii) the \emph{sum of supports} $\sum_{k=0}^{K-1}|\supp(x_{k+1})|$ over the \emph{transient} phase
before inactive coordinates are permanently identified as zero.
We also specialize constants to the $\ell_1$-regularized PageRank objective (COLT'22 open problem).
\end{abstract}

\section{Composite model, support sets, and a margin parameter}

Let
\begin{equation}\label{eq:prob}
\min_{x\in\R^n} F(x) := f(x) + g(x),
\qquad
g(x) := \sum_{i=1}^n \lambda_i |x_i|,
\quad \lambda_i>0.
\end{equation}
Assume $f$ is differentiable, $\nabla f$ is $L$-Lipschitz in $\ell_2$, and $F$ is $\mu$-strongly convex
($\mu>0$). Then $F$ has a unique minimizer $x^\star$.

Define the optimal support and its complement
\[
S^\star := \supp(x^\star),\qquad I^\star := \{1,\dots,n\}\setminus S^\star.
\]

\subsection{Weighted soft-thresholding proximal map}

\begin{lemma}[Prox of weighted $\ell_1$]\label{lem:prox}
For any $t>0$ and $u\in\R^n$,
\[
\prox_{t g}(u)_i = \operatorname{sign}(u_i)\,\max\{|u_i|-t\lambda_i,\,0\}.
\]
In particular, $\prox_{t g}(u)_i\neq 0$ if and only if $|u_i|>t\lambda_i$.
\end{lemma}

\begin{proof}
The proximal subproblem is separable over coordinates:
$\min_x \frac{1}{2t}\|x-u\|_2^2 + \sum_i \lambda_i|x_i|$, giving weighted soft-thresholding.
\end{proof}

\subsection{Margin / strict complementarity}

First-order optimality gives $0\in\nabla f(x^\star)+\partial g(x^\star)$.
In particular, for any $i\in I^\star$ (so $x_i^\star=0$), one has $|\nabla f(x^\star)_i|\le \lambda_i$.

\begin{assumption}[Margin parameter]\label{ass:margin}
Define
\begin{equation}\label{eq:gamma}
\gamma := \min_{i\in I^\star}\bigl(\lambda_i - |\nabla f(x^\star)_i|\bigr).
\end{equation}
Assume $\gamma>0$.
Equivalently, every inactive coordinate satisfies $|\nabla f(x^\star)_i|\le \lambda_i-\gamma$.
\end{assumption}

\begin{remark}
Assumption~\ref{ass:margin} is a standard strict-complementarity condition for $\ell_1$ regularization.
Without $\gamma>0$, finite-time identification and sharp sparsity control can fail in worst case.
\end{remark}

\section{Accelerated proximal gradient}

Fix a step size $t\in(0,1/L]$ and a momentum parameter $\beta\in[0,1)$.
Consider the inertial/accelerated proximal-gradient iteration
\begin{equation}\label{eq:apg}
\begin{aligned}
y_k &= x_k + \beta (x_k-x_{k-1}),\\
x_{k+1} &= \prox_{t g}\!\bigl(y_k - t\nabla f(y_k)\bigr),
\end{aligned}
\qquad k\ge 0,
\end{equation}
initialized with
\begin{equation}\label{eq:init}
x_{-1}=x_0=0.
\end{equation}
(Thus $y_0=x_0=0$.)

Define the \emph{spurious active set at step $k$}
\begin{equation}\label{eq:Ak}
A_k := \supp(x_{k+1})\cap I^\star
= \{i\in I^\star:\ x_{k+1,i}\neq 0\}.
\end{equation}

\section{Key counting lemma: spurious activations cost distance to $x^\star$}

Define the forward (gradient) map input
\[
u(y) := y - t\nabla f(y).
\]

\begin{lemma}[Each spurious active coordinate forces a $t\gamma$-jump]\label{lem:tgammastep}
Assume Assumption~\ref{ass:margin}. Fix any $y\in\R^n$ and set $x^+(y)=\prox_{t g}(u(y))$.
Let $A(y):=\supp(x^+(y))\cap I^\star$.
Then for every $i\in A(y)$,
\[
|u(y)_i-u(x^\star)_i| > t\gamma,
\]
and consequently
\begin{equation}\label{eq:counting}
|A(y)| \le \frac{\|u(y)-u(x^\star)\|_2^2}{t^2\gamma^2}.
\end{equation}
\end{lemma}

\begin{proof}
Fix $i\in I^\star$. Then $x_i^\star=0$ and $u(x^\star)_i=-t\nabla f(x^\star)_i$, so
$|u(x^\star)_i|=t|\nabla f(x^\star)_i|\le t(\lambda_i-\gamma)$.
If $i\in A(y)$ then $x^+(y)_i\neq 0$, hence by Lemma~\ref{lem:prox} we have $|u(y)_i|>t\lambda_i$.
Therefore
\[
|u(y)_i-u(x^\star)_i|\ge |u(y)_i|-|u(x^\star)_i|>t\lambda_i-t(\lambda_i-\gamma)=t\gamma.
\]
Summing over $i\in A(y)$ gives $\|u(y)-u(x^\star)\|_2^2>\sum_{i\in A(y)}(t\gamma)^2=|A(y)|t^2\gamma^2$,
which implies \eqref{eq:counting}.
\end{proof}

\begin{lemma}[Lipschitz transfer to $y$-distance]\label{lem:uLip}
If $\nabla f$ is $L$-Lipschitz, then for all $y$,
\[
\|u(y)-u(x^\star)\|_2 \le (1+tL)\|y-x^\star\|_2.
\]
Hence,
\begin{equation}\label{eq:Ak_vs_y}
|A(y)| \le \left(\frac{1+tL}{t\gamma}\right)^2 \|y-x^\star\|_2^2.
\end{equation}
\end{lemma}

\begin{proof}
$u(y)-u(x^\star)=(y-x^\star)-t(\nabla f(y)-\nabla f(x^\star))$.
Take norms and use $\|\nabla f(y)-\nabla f(x^\star)\|_2\le L\|y-x^\star\|_2$.
Combine with Lemma~\ref{lem:tgammastep}.
\end{proof}

Applying Lemma~\ref{lem:uLip} to $y=y_k$ gives the per-iteration spurious-support bound
\begin{equation}\label{eq:Ak_bound_general}
|A_k| \le \left(\frac{1+tL}{t\gamma}\right)^2 \|y_k-x^\star\|_2^2.
\end{equation}

\section{Bounding $\|y_k-x^\star\|$ using a standard accelerated linear rate}

To obtain an explicit \emph{sum} bound, we use a standard accelerated linear convergence statement.

\begin{assumption}[Standard accelerated linear convergence]\label{ass:linear_rate}
There exist constants $q\in(0,1)$ and $\Delta_0>0$ such that
\begin{equation}\label{eq:linear_rate}
F(x_k)-F(x^\star) \le \Delta_0\, q^k\qquad \text{for all }k\ge 0.
\end{equation}
With initialization $x_0=0$, we have $\Delta_0=F(0)-F(x^\star)$.
\end{assumption}

\begin{remark}
For strongly convex composite objectives, a \emph{standard} accelerated proximal-gradient method
with step $t=1/L$ and a constant momentum choice (e.g.\ Nesterov's $\beta=\frac{\sqrt{L}-\sqrt{\mu}}{\sqrt{L}+\sqrt{\mu}}$)
achieves \eqref{eq:linear_rate} with $q\approx 1-\sqrt{\mu/L}$ (up to standard constant-factor variants).
The identification/sparsity argument below only needs \eqref{eq:linear_rate} and strong convexity.
\end{remark}

\begin{lemma}[From function gap to $y_k$-distance]\label{lem:yk_distance}
Assume $F$ is $\mu$-strongly convex and Assumption~\ref{ass:linear_rate} holds.
Then for all $k\ge 1$,
\begin{equation}\label{eq:yk_dist_bound}
\|y_k-x^\star\|_2^2 \le \frac{4\Delta_0}{\mu}\Bigl( (1+\beta)^2 q^k + \beta^2 q^{k-1}\Bigr)
\le \underbrace{\frac{4\Delta_0}{\mu}\Bigl((1+\beta)^2 q + \beta^2\Bigr)}_{=:M}\, q^{k-1}.
\end{equation}
Also, $\|y_0-x^\star\|_2^2=\|x_0-x^\star\|_2^2 \le \frac{2\Delta_0}{\mu}$.
\end{lemma}

\begin{proof}
Strong convexity gives $F(x)-F(x^\star)\ge \frac{\mu}{2}\|x-x^\star\|_2^2$, hence
$\|x_k-x^\star\|_2^2 \le \frac{2}{\mu}(F(x_k)-F(x^\star)) \le \frac{2\Delta_0}{\mu}q^k$.

For $k\ge 1$, write
$y_k-x^\star=(1+\beta)(x_k-x^\star)-\beta(x_{k-1}-x^\star)$ and use
$\|a-b\|_2^2 \le 2\|a\|_2^2 + 2\|b\|_2^2$ to get
\[
\|y_k-x^\star\|_2^2 \le 2(1+\beta)^2\|x_k-x^\star\|_2^2 + 2\beta^2\|x_{k-1}-x^\star\|_2^2.
\]
Substitute the bounds on $\|x_k-x^\star\|_2^2$ and $\|x_{k-1}-x^\star\|_2^2$.
The second inequality in \eqref{eq:yk_dist_bound} uses $q^k\le q^{k-1}$ for $k\ge 1$.
For $k=0$, $y_0=x_0$ and the strong convexity bound gives the claim.
\end{proof}

\section{Transient identification and the sum of supports}

The margin implies that once $y_k$ is sufficiently close to $x^\star$, \emph{no} inactive coordinate can be activated.

\begin{lemma}[Stability radius for inactivity]\label{lem:stability}
Assume Assumption~\ref{ass:margin}. Define
\begin{equation}\label{eq:r_def}
r := \frac{t\gamma}{1+tL}.
\end{equation}
If $\|y-x^\star\|_2\le r$, then $A(y)=\emptyset$ (i.e.\ $\supp(\prox_{tg}(u(y)))\subseteq S^\star$).
In particular, if $\|y_k-x^\star\|_2\le r$ then $A_k=\emptyset$.
\end{lemma}

\begin{proof}
If $\|y-x^\star\|_2\le r$, then by Lemma~\ref{lem:uLip},
$\|u(y)-u(x^\star)\|_2\le (1+tL)\|y-x^\star\|_2\le t\gamma$.
Then \eqref{eq:counting} forces $|A(y)|\le 1$ but in fact the strict inequality in Lemma~\ref{lem:tgammastep}
shows no coordinate can satisfy $|u(y)_i-u(x^\star)_i|>t\gamma$; hence $A(y)=\emptyset$.
\end{proof}

Define the \emph{(margin-based) identification time}
\begin{equation}\label{eq:Kdef}
K_\gamma := \min\{k\ge 0:\ \|y_k-x^\star\|_2\le r\}.
\end{equation}
Then by Lemma~\ref{lem:stability}, $A_k=\emptyset$ for all $k\ge K_\gamma$.
We call iterations $k=0,1,\dots,K_\gamma-1$ the \emph{transient phase}.

\begin{lemma}[Explicit bound on $K_\gamma$]\label{lem:Kbound}
Assume Assumption~\ref{ass:linear_rate} and let $M$ be as in Lemma~\ref{lem:yk_distance}.
Then
\[
K_\gamma \le 1 + \left\lceil \frac{\log(M/r^2)}{\log(1/q)} \right\rceil.
\]
\end{lemma}

\begin{proof}
For $k\ge 1$, Lemma~\ref{lem:yk_distance} gives $\|y_k-x^\star\|_2^2\le M q^{k-1}$.
It suffices that $M q^{k-1}\le r^2$, i.e.\ $k-1\ge \log(M/r^2)/\log(1/q)$.
\end{proof}

\subsection{Main theorem: a bound on $\sum_{k=0}^{K_\gamma-1}|\supp(x_{k+1})|$}

\begin{theorem}[Sum of supports over the transient]\label{thm:sum_support_transient}
Assume Assumptions~\ref{ass:margin} and \ref{ass:linear_rate}, and run \eqref{eq:apg}--\eqref{eq:init}.
Let $K_\gamma$ be defined by \eqref{eq:Kdef}. Then
\begin{equation}\label{eq:sum_support_main}
\sum_{k=0}^{K_\gamma-1} |\supp(x_{k+1})|
\;\le\;
|S^\star|\,K_\gamma
\;+\;
\left(\frac{1+tL}{t\gamma}\right)^2
\left(
\frac{2\Delta_0}{\mu}
+\frac{4\Delta_0}{\mu}\cdot \frac{(1+\beta)^2 q + \beta^2}{1-q}
\right).
\end{equation}
Moreover, $K_\gamma$ satisfies Lemma~\ref{lem:Kbound}.

In particular, if one uses the standard choices $t=1/L$ and $q\approx 1-\sqrt{\mu/L}$,
then the \emph{spurious part} of the sum in \eqref{eq:sum_support_main} is
\[
O\!\left(\frac{\Delta_0\,L^{5/2}}{\gamma^2\,\mu^{3/2}}\right),
\]
and $K_\gamma = O\!\left(\sqrt{\tfrac{L}{\mu}}\log \tfrac{1}{\gamma}\right)$ (up to constants hidden in $M$).
\end{theorem}

\begin{proof}
For each $k$, $\supp(x_{k+1}) = (\supp(x_{k+1})\cap S^\star)\cup A_k$, hence
$|\supp(x_{k+1})|\le |S^\star| + |A_k|$.
Summing over $k=0,\dots,K_\gamma-1$ yields
\[
\sum_{k=0}^{K_\gamma-1} |\supp(x_{k+1})|
\le |S^\star|K_\gamma + \sum_{k=0}^{K_\gamma-1}|A_k|
\le |S^\star|K_\gamma + \sum_{k=0}^{\infty}|A_k|.
\]
Now use \eqref{eq:Ak_bound_general} with Lemma~\ref{lem:yk_distance}:
\[
|A_0|\le \left(\frac{1+tL}{t\gamma}\right)^2 \|y_0-x^\star\|_2^2
\le \left(\frac{1+tL}{t\gamma}\right)^2 \frac{2\Delta_0}{\mu},
\]
and for $k\ge 1$,
\[
|A_k| \le \left(\frac{1+tL}{t\gamma}\right)^2 \|y_k-x^\star\|_2^2
\le
\left(\frac{1+tL}{t\gamma}\right)^2
\frac{4\Delta_0}{\mu}\Bigl( (1+\beta)^2 q^k + \beta^2 q^{k-1}\Bigr).
\]
Sum the geometric series:
$\sum_{k=1}^\infty q^k = \frac{q}{1-q}$ and $\sum_{k=1}^\infty q^{k-1}=\frac{1}{1-q}$.
This gives \eqref{eq:sum_support_main}.
The asymptotic statements follow by inserting $t=1/L$ so $(1+tL)/(t\gamma)=2L/\gamma$ and
$1/(1-q)\asymp \sqrt{L/\mu}$ when $q\approx 1-\sqrt{\mu/L}$.
\end{proof}

\begin{remark}[Interpretation]
Theorem~\ref{thm:sum_support_transient} bounds the \emph{total amount of sparsity work} accumulated
before strict complementarity (margin) forces all inactive coordinates to stay zero.
The bound splits cleanly into:
(i) a `<code>true-support'' term $|S^\star|K_\gamma$, and
(ii) a purely margin-controlled </code>`spurious'' term scaling like $\gamma^{-2}$ times a condition-number factor.
\end{remark}

\section{Specialization: $\ell_1$-regularized PageRank}

We record the standard constants for the $\ell_1$-regularized PageRank objective (COLT'22 open problem):
\[
F(x)= \frac12 x^\top Qx - \alpha x^\top D^{-1/2}s \;+\; \rho\alpha \|D^{1/2}x\|_1,
\]
with $Q=\alpha I + \frac{1-\alpha}{2}L_{\rm sym}$ and $L_{\rm sym}$ the normalized Laplacian.
For undirected graphs, $\lambda(L_{\rm sym})\subseteq[0,2]$, hence $\lambda(Q)\subseteq[\alpha,1]$.
Therefore $\nabla f$ is $L$-Lipschitz with $L\le 1$, and $F$ is $\mu$-strongly convex with $\mu=\alpha$.
Also $\lambda_i = \rho\alpha\sqrt{d_i}$ in \eqref{eq:prob}.

\begin{corollary}[PageRank constants; explicit $\Delta_0$ bound from $x_0=0$]\label{cor:ppr}
Assume the seed is a single node $v$ so $s=e_v$.
With initialization $x_0=0$ we have $F(0)=0$ and
\[
\Delta_0 = F(0)-F(x^\star) = -F(x^\star) \le \frac{\alpha}{2d_v}\le \frac{\alpha}{2}.
\]
If we use the standard step $t=1$ (since $L\le 1$) then $r=\gamma/2$ and $(1+tL)/(t\gamma)\le 2/\gamma$.
Consequently, the spurious part of \eqref{eq:sum_support_main} is
\[
O\!\left(\frac{1}{\gamma^2\sqrt{\alpha}}\right)
\quad\text{(up to standard constant factors in the accelerated rate and the momentum choice).}
\]
Moreover, $K_\gamma = O\!\left(\frac{1}{\sqrt{\alpha}}\log\frac{1}{\gamma}\right)$.
\end{corollary}

\begin{proof}
Since $g(x)\ge 0$, we have $F(x)\ge f(x)$ and thus $F(x^\star)\ge \min_x f(x)$.
For $f(x)=\frac12 x^\top Qx - b^\top x$ with $b=\alpha D^{-1/2}s$, the unconstrained minimizer is
$x_f^\star=Q^{-1}b$ and $\min f = -\frac12 b^\top Q^{-1}b$.
Because $Q\succeq \alpha I$, we have $Q^{-1}\preceq \frac{1}{\alpha}I$ and hence
$b^\top Q^{-1}b \le \frac{1}{\alpha}\|b\|_2^2 = \frac{1}{\alpha}\alpha^2\|D^{-1/2}s\|_2^2
=\alpha\|D^{-1/2}s\|_2^2$.
For $s=e_v$, $\|D^{-1/2}s\|_2^2 = 1/d_v$, giving $\min f \ge -\alpha/(2d_v)$, hence
$-F(x^\star)\le -\min f \le \alpha/(2d_v)$.
The remaining statements follow by inserting $L\le 1$, $\mu=\alpha$, $t=1$ into
Theorem~\ref{thm:sum_support_transient} and Lemma~\ref{lem:Kbound}.
\end{proof}

\begin{remark}[What this bound does \emph{not} give]
The bound is explicit and dimension-free, but it depends on the margin $\gamma$.
To recover a purely $(\alpha,\rho)$-based locality bound (e.g.\ $\widetilde O((\sqrt{\alpha}\rho)^{-1})$)
one would need an additional argument lower-bounding $\gamma$ in terms of $\rho$ for the instance class of interest.
\end{remark}

\end{document}

User: That's excellent!![edit | edit source]

That's excellent!! Can you now bound the total running of the algorithm? You can use the fact that |S^*| <= 1/ρ. Don't update the whole latex, just give me the new material to append.

Assistant: Append the following material it converts the support-sum bound into a total running-time / total coordinate-work bound…[edit | edit source]

Append the following material (it converts the support-sum bound into a total running-time / total coordinate-work bound, using ∣S⋆∣≤1/ρ|S^\star|\le 1/\rho∣S⋆∣≤1/ρ as in the local-PPR setting). open-problem-fountoulakis22a

% ============================================================
\section{Total running time via the sum of supports}

We now convert the support-sum bounds into a total work bound for an $\varepsilon$-accurate solution.
Throughout, we measure work in a standard \emph{support-sensitive} model: the cost of iteration $k$
is proportional to the number of (stored/updated) nonzeros in the vectors that appear in the update.
This is the same model implicitly used when one argues that a method is ``local'' because it only
touches $O(1/\rho)$ coordinates per iteration.

\subsection{A simple support-sensitive work model}

\begin{assumption}[Support-sensitive iteration cost]\label{ass:work_model}
There exist absolute constants $c_1,c_2>0$ such that iteration $k$ of \eqref{eq:apg}
can be implemented with work
\[
\mathrm{work}(k) \;\le\; c_1\,|\supp(y_k)| \;+\; c_2\,|\supp(x_{k+1})|.
\]
(Any $O(1)$ overhead per iteration can be absorbed into the constants.)
\end{assumption}

\begin{lemma}[Support of the extrapolated point]\label{lem:supp_y}
For all $k\ge 1$,
\[
\supp(y_k)\subseteq \supp(x_k)\cup \supp(x_{k-1}),
\qquad\text{hence}\qquad
|\supp(y_k)| \le |\supp(x_k)|+|\supp(x_{k-1})|.
\]
\end{lemma}

\begin{proof}
Coordinatewise, $y_{k,i}=(1+\beta)x_{k,i}-\beta x_{k-1,i}$.
Thus if $x_{k,i}=x_{k-1,i}=0$ then $y_{k,i}=0$.
\end{proof}

\subsection{Total work up to a fixed iteration budget}

Define the cumulative work up to $N$ iterations as
\[
\mathrm{Work}(N) := \sum_{k=0}^{N-1}\mathrm{work}(k).
\]

Let
\begin{equation}\label{eq:Cspur_def}
C_{\rm spur}:=\left(\frac{1+tL}{t\gamma}\right)^2
\left(
\frac{2\Delta_0}{\mu}
+\frac{4\Delta_0}{\mu}\cdot \frac{(1+\beta)^2 q + \beta^2}{1-q}
\right),
\end{equation}
which is exactly the constant appearing in Theorem~\ref{thm:sum_support_transient}.

\begin{corollary}[Work bound up to $N$ iterations]\label{cor:workN}
Assume Assumptions~\ref{ass:margin},~\ref{ass:linear_rate}, and~\ref{ass:work_model}.
Then for every integer $N\ge 1$,
\begin{align}
\sum_{k=0}^{N-1} |\supp(x_{k+1})|
&\le N|S^\star| + C_{\rm spur},\label{eq:supp_sum_xN}\\
\sum_{k=0}^{N-1} |\supp(y_k)|
&\le 2N|S^\star| + 2C_{\rm spur},\label{eq:supp_sum_yN}
\end{align}
and therefore
\begin{equation}\label{eq:workN_bound}
\mathrm{Work}(N) \;\le\; O\!\left(N|S^\star| + C_{\rm spur}\right),
\end{equation}
where the hidden constant depends only on $(c_1,c_2)$.

If, moreover, $|S^\star|\le 1/\rho$, then
\begin{equation}\label{eq:workN_rho}
\mathrm{Work}(N) \;\le\; O\!\left(\frac{N}{\rho} + C_{\rm spur}\right).
\end{equation}
\end{corollary}

\begin{proof}
For each $k$ we have $\supp(x_{k+1})\subseteq S^\star \cup A_k$, hence
$|\supp(x_{k+1})|\le |S^\star|+|A_k|$.
Summing over $k=0,\dots,N-1$ yields
$\sum_{k=0}^{N-1} |\supp(x_{k+1})|\le N|S^\star| + \sum_{k=0}^{N-1}|A_k|$.
The proof of Theorem~\ref{thm:sum_support_transient} bounds the entire geometric series
$\sum_{k=0}^{\infty}|A_k|\le C_{\rm spur}$, which implies \eqref{eq:supp_sum_xN}.
Next, Lemma~\ref{lem:supp_y} gives for $k\ge 1$:
$|\supp(y_k)|\le |\supp(x_k)|+|\supp(x_{k-1})|$.
Summing this for $k=1,\dots,N-1$ and using $y_0=x_0$ yields \eqref{eq:supp_sum_yN}.
Finally, combine \eqref{eq:supp_sum_xN}--\eqref{eq:supp_sum_yN} with Assumption~\ref{ass:work_model}.
\end{proof}

\subsection{Total work to reach $\varepsilon$ accuracy}

Assumption~\ref{ass:linear_rate} implies that $F(x_N)-F(x^\star)\le \varepsilon$ once
\begin{equation}\label{eq:Neps_def}
N \;\ge\; N_\varepsilon
:= \left\lceil \frac{\log(\Delta_0/\varepsilon)}{\log(1/q)} \right\rceil.
\end{equation}

\begin{corollary}[Total running time to $\varepsilon$ accuracy]\label{cor:workeps}
Assume $|S^\star|\le 1/\rho$ and the hypotheses of Corollary~\ref{cor:workN}.
Then the total support-sensitive work to compute an iterate $x_N$ with
$F(x_N)-F(x^\star)\le \varepsilon$ is
\begin{equation}\label{eq:workeps_general}
\mathrm{Work}(N_\varepsilon)
\;\le\;
O\!\left(
\frac{1}{\rho}\cdot\frac{\log(\Delta_0/\varepsilon)}{\log(1/q)}
\;+\;
C_{\rm spur}
\right).
\end{equation}

With the standard accelerated choice $q\approx 1-\sqrt{\mu/L}$ and step $t=1/L$, this becomes
\begin{equation}\label{eq:workeps_simplified}
\mathrm{Work}(N_\varepsilon)
=
O\!\left(
\frac{1}{\rho}\sqrt{\frac{L}{\mu}}\,\log\frac{\Delta_0}{\varepsilon}
\;+\;
\frac{\Delta_0\,L^{5/2}}{\gamma^2\,\mu^{3/2}}
\right).
\end{equation}

In the $\ell_1$-regularized PageRank setting (undirected graph), $L\le 1$ and $\mu=\alpha$.
If additionally $\Delta_0\le \alpha/2$ (single-seed case) and we take $t=1$, then
\begin{equation}\label{eq:workeps_ppr}
\mathrm{Work}(N_\varepsilon)
=
O\!\left(
\frac{1}{\rho\sqrt{\alpha}}\log\frac{1}{\varepsilon}
\;+\;
\frac{1}{\gamma^2\sqrt{\alpha}}
\right),
\end{equation}
up to the standard constant-factor variants in the accelerated rate.
\end{corollary}
% ============================================================

User: Excellent![edit | edit source]

Excellent! Can you show me case where a combination of parameters of the algorithm and a particular bad graph example can result to γ < sqrt{ρ} ? don't produce latex yet. Show me your answer here.

Assistant: Yes — you can make the margin γ\gammaγ arbitrarily small even 0 for an ℓ1\ell1ℓ1​-regularized PageRank instance by choo…[edit | edit source]

Yes — you can make the margin γ\gammaγ arbitrarily small (even 0) for an ℓ1\ell_1ℓ1​-regularized PageRank instance by choosing (α,ρ)(\alpha,\rho)(α,ρ) and a “bad” graph so that you sit right at (or extremely near) a support-change breakpoint of the regularization path. This makes ∣∇f(x⋆)i∣|\nabla f(x^\star)_i|∣∇f(x⋆)i​∣ almost equal to the threshold λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi​=ραdi​​ for some inactive coordinate, hence γ=λi−∣∇f(x⋆)i∣\gamma=\lambda_i-|\nabla f(x^\star)_i|γ=λi​−∣∇f(x⋆)i​∣ becomes tiny. (This is exactly what strict complementarity rules out.)

Below is a fully explicit, connected graph example where you can compute γ\gammaγ in closed form and force γ<ρ\gamma < \sqrt{\rho}γ<ρ​ by a wide margin.

Take a star graph with:[edit | edit source]

  • one center node ccc connected to mmm leaf nodes ℓ1,…,ℓm\ell_1,\dots,\ell_mℓ1​,…,ℓm​,
  • seed s=ecs=e_cs=ec​,
  • teleportation α∈(0,1)\alpha\in(0,1)α∈(0,1),
  • regularization parameter ρ>0\rho>0ρ>0.

Degrees:

  • dc=md_c=mdc​=m,
  • dℓ=1d_{\ell}=1dℓ​=1 for each leaf.

For the ℓ1\ell_1ℓ1​-regularized PPR objective (your equation (3)), the weighted ℓ1\ell_1ℓ1​ penalty is

  • λc=ραm\lambda_c = \rho\alpha\sqrt{m}λc​=ραm​ at the center,
  • λℓ=ρα\lambda_{\ell} = \rho\alphaλℓ​=ρα at each leaf. open-problem-fountoulakis22a

Because of symmetry, in a certain parameter regime the optimizer has the form:

  • xc⋆>0x_c^\star > 0xc⋆​>0,
  • xℓ⋆=0x_{\ell}^\star = 0xℓ⋆​=0 for every leaf.
In this regime (all leaves are zero), the center coordinate satisfies[edit | edit source]

(Qx⋆−αD−1/2s)c+λc=0,(Qx^\star - \alpha D^{-1/2}s)_c + \lambda_c = 0,(Qx⋆−αD−1/2s)c​+λc​=0, which gives

xc⋆  =  2α(1−ρm)(1+α)m.x_c^\star \;=\; \frac{2\alpha(1-\rho m)}{(1+\alpha)\sqrt{m}}.xc⋆​=(1+α)m​2α(1−ρm)​. This is positive as long as ρ<1/m\rho < 1/mρ<1/m.

A leaf ℓ\ellℓ stays at zero iff its KKT inequality holds:[edit | edit source]

∣(Qx⋆−αD−1/2s)ℓ∣≤λℓ=ρα.|(Qx^\star - \alpha D^{-1/2}s)_\ell| \le \lambda_\ell = \rho\alpha.∣(Qx⋆−αD−1/2s)ℓ​∣≤λℓ​=ρα. For the star, this inequality reduces to a very clean threshold:

ρ  ≥  ρ0:=1−α2m.\rho \;\ge\; \rho_0 := \frac{1-\alpha}{2m}.ρ≥ρ0​:=2m1−α​. So for

  ρ∈[1−α2m,  1m)  \boxed{\;\rho \in \Big[\frac{1-\alpha}{2m},\;\frac{1}{m}\Big)\;}ρ∈[2m1−α​,m1​)​ the unique optimum has support S⋆={c}S^\star=\{c\}S⋆={c} (so ∣S⋆∣=1≤1/ρ|S^\star|=1\le 1/\rho∣S⋆∣=1≤1/ρ automatically).

All inactive coordinates are leaves and are symmetric, so the margin is attained on any leaf:[edit | edit source]

γ=λℓ−∣(Qx⋆−αD−1/2s)ℓ∣.\gamma = \lambda_\ell - |(Qx^\star - \alpha D^{-1/2}s)_\ell|.γ=λℓ​−∣(Qx⋆−αD−1/2s)ℓ​∣. A direct substitution yields the closed form:

  γ  =  α1+α(2ρ−1−αm)  =  2α1+α(ρ−ρ0).  \boxed{\; \gamma \;=\; \frac{\alpha}{1+\alpha}\Big(2\rho - \frac{1-\alpha}{m}\Big) \;=\; \frac{2\alpha}{1+\alpha}\big(\rho-\rho_0\big). \;}γ=1+αα​(2ρ−m1−α​)=1+α2α​(ρ−ρ0​).​ Two key takeaways:

  • At the breakpoint ρ=ρ0\rho=\rho_0ρ=ρ0​: γ=0⇒γ<ρ (trivially).\gamma = 0 \quad\Rightarrow\quad \gamma < \sqrt{\rho}\ \text{(trivially)}.γ=0⇒γ<ρ​ (trivially).
  • If you choose ρ=ρ0+δ\rho=\rho_0+\deltaρ=ρ0​+δ with δ>0\delta>0δ>0 tiny, then γ=2α1+αδ,\gamma = \frac{2\alpha}{1+\alpha}\delta,γ=1+α2α​δ, so you can make γ\gammaγ as small as you want, hence certainly <ρ<\sqrt{\rho}<ρ​.

Pick typical PageRank teleportation, say:[edit | edit source]

  • α=0.15\alpha = 0.15α=0.15,
  • star size m=100m = 100m=100.

Then

ρ0=1−α2m=0.85200=0.00425.\rho_0 = \frac{1-\alpha}{2m}=\frac{0.85}{200}=0.00425.ρ0​=2m1−α​=2000.85​=0.00425. Take ρ=0.004251\rho = 0.004251ρ=0.004251 (so δ=10−6\delta = 10^{-6}δ=10−6). Then

γ=2α1+αδ=0.31.15⋅10−6≈2.61×10−7.\gamma = \frac{2\alpha}{1+\alpha}\delta = \frac{0.3}{1.15}\cdot 10^{-6} \approx 2.61\times 10^{-7}.γ=1+α2α​δ=1.150.3​⋅10−6≈2.61×10−7. But

ρ≈0.004251≈6.52×10−2.\sqrt{\rho} \approx \sqrt{0.004251} \approx 6.52\times 10^{-2}.ρ​≈0.004251​≈6.52×10−2. So

γ≈2.6⋅10−7  ≪  6.5⋅10−2  =  ρ.\gamma \approx 2.6\cdot 10^{-7} \;\ll\; 6.5\cdot 10^{-2} \;=\; \sqrt{\rho}.γ≈2.6⋅10−7≪6.5⋅10−2=ρ​. This is a clean “bad” case: connected graph, ∣S⋆∣=1≤1/ρ|S^\star|=1 \le 1/\rho∣S⋆∣=1≤1/ρ, yet γ\gammaγ is orders of magnitude smaller than ρ\sqrt{\rho}ρ​ because ρ\rhoρ is chosen near a support-change breakpoint.

Step size ttt, momentum β\betaβ, etc. do not change γ\gammaγ: γ\gammaγ is a property of the problem instance (graph + seed + α,ρ\alpha,\rhoα,ρ). What the algorithm parameters can do is affect how long you linger in the transient, but they don’t alter the KKT slack at the true minimizer.[edit | edit source]

If you want, I can also give a path graph construction where ρ0\rho_0ρ0​ is small even with moderate mmm, but the star example is the simplest explicit one where you can compute γ(ρ)\gamma(\rho)γ(ρ) in closed form.

User: Sure, give me a path graph construction too.[edit | edit source]

Sure, give me a path graph construction too.

Assistant: Here’s a path graph version of the same “bad margin” phenomenon.[edit | edit source]

Here’s a path graph version of the same “bad margin” phenomenon. It’s actually even cleaner than you might expect: you can make γ\gammaγ arbitrarily small by tuning ρ\rhoρ to sit extremely close to the point where the next node on the path is about to enter the support.

I’ll stick to the ℓ1\ell_1ℓ1​-regularized PPR objective (your eq. (3)) and the usual Q=αI+1−α2LQ=\alpha I+\frac{1-\alpha}{2}LQ=αI+21−α​L setup. open-problem-fountoulakis22a

Let Pm+1P_{m+1}Pm+1​ be the path graph on nodes 1,2,…,m+11,2,\dots,m+11,2,…,m+1 with edges (i,i+1)(i,i+1)(i,i+1).[edit | edit source]

Take the seed s=e1s=e_1s=e1​ at endpoint node 111.

Degrees:

  • d1=1d_1=1d1​=1,
  • di=2d_i=2di​=2 for 2≤i≤m2\le i\le m2≤i≤m,
  • dm+1=1d_{m+1}=1dm+1​=1.

Regularizer weights in (3) are λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi​=ραdi​​, so

  • λ1=ρα\lambda_1=\rho\alphaλ1​=ρα,
  • λ2=ρα2\lambda_2=\rho\alpha\sqrt{2}λ2​=ρα2​,
  • and for interior nodes similarly ρα2\rho\alpha\sqrt{2}ρα2​.

Also, for an undirected graph you can write

Q  =  1+α2I  −  1−α2D−1/2AD−1/2,Q \;=\; \frac{1+\alpha}{2}I \;-\; \frac{1-\alpha}{2}D^{-1/2}AD^{-1/2},Q=21+α​I−21−α​D−1/2AD−1/2, so the only off-diagonals are on edges:

Qij=−1−α2⋅1didjif (i,j)∈E.Q_{ij} = -\frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i d_j}} \quad \text{if } (i,j)\in E.Qij​=−21−α​⋅di​dj​​1​if (i,j)∈E. In particular, between nodes 111 and 222,

Q21=−1−α2⋅11⋅2=−1−α22.Q_{21} = -\frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{1\cdot 2}} = -\frac{1-\alpha}{2\sqrt{2}}.Q21​=−21−α​⋅1⋅2​1​=−22​1−α​.

Consider the candidate solution[edit | edit source]

x⋆=(x1⋆,0,0,…,0),x1⋆>0.x^\star = (x_1^\star, 0,0,\dots,0), \quad x_1^\star>0.x⋆=(x1⋆​,0,0,…,0),x1⋆​>0.

The smooth gradient is ∇f(x)=Qx−αD−1/2s\nabla f(x)=Qx-\alpha D^{-1/2}s∇f(x)=Qx−αD−1/2s, and since d1=1d_1=1d1​=1 we have D−1/2s=e1D^{-1/2}s=e_1D−1/2s=e1​.[edit | edit source]

So the KKT condition at coordinate 1 (active, positive) is:

(Qx⋆)1−α+λ1=0.(Qx^\star)_1 - \alpha + \lambda_1 = 0.(Qx⋆)1​−α+λ1​=0. But (Qx⋆)1=Q11x1⋆=1+α2x1⋆(Qx^\star)_1 = Q_{11}x_1^\star = \frac{1+\alpha}{2}x_1^\star(Qx⋆)1​=Q11​x1⋆​=21+α​x1⋆​. Hence

1+α2x1⋆−α+ρα=0⟹x1⋆=2α(1−ρ)1+α.\frac{1+\alpha}{2}x_1^\star - \alpha + \rho\alpha = 0 \quad\Longrightarrow\quad x_1^\star = \frac{2\alpha(1-\rho)}{1+\alpha}.21+α​x1⋆​−α+ρα=0⟹x1⋆​=1+α2α(1−ρ)​. So this is positive as long as ρ<1\rho<1ρ<1.

Node 2 is inactive (x2⋆=0x_2^\star=0x2⋆​=0) iff its KKT inequality holds:[edit | edit source]

∣∇f(x⋆)2∣≤λ2.|\nabla f(x^\star)_2| \le \lambda_2.∣∇f(x⋆)2​∣≤λ2​. Now ∇f(x⋆)2=(Qx⋆)2\nabla f(x^\star)_2 = (Qx^\star)_2∇f(x⋆)2​=(Qx⋆)2​ (no seed term at node 2), and since only x1⋆x_1^\starx1⋆​ is nonzero,

(Qx⋆)2=Q21x1⋆=−1−α22x1⋆.(Qx^\star)_2 = Q_{21}x_1^\star = -\frac{1-\alpha}{2\sqrt{2}}x_1^\star.(Qx⋆)2​=Q21​x1⋆​=−22​1−α​x1⋆​. So

∣∇f(x⋆)2∣=1−α22x1⋆=1−α22⋅2α(1−ρ)1+α=α(1−α)(1−ρ)(1+α)2.|\nabla f(x^\star)_2| = \frac{1-\alpha}{2\sqrt{2}}x_1^\star = \frac{1-\alpha}{2\sqrt{2}}\cdot \frac{2\alpha(1-\rho)}{1+\alpha} = \frac{\alpha(1-\alpha)(1-\rho)}{(1+\alpha)\sqrt{2}}.∣∇f(x⋆)2​∣=22​1−α​x1⋆​=22​1−α​⋅1+α2α(1−ρ)​=(1+α)2​α(1−α)(1−ρ)​. The threshold is λ2=ρα2\lambda_2=\rho\alpha\sqrt{2}λ2​=ρα2​. The inequality becomes:

α(1−α)(1−ρ)(1+α)2≤ρα2⟺(1−α)(1−ρ)1+α≤2ρ.\frac{\alpha(1-\alpha)(1-\rho)}{(1+\alpha)\sqrt{2}} \le \rho\alpha\sqrt{2} \quad\Longleftrightarrow\quad \frac{(1-\alpha)(1-\rho)}{1+\alpha}\le 2\rho.(1+α)2​α(1−α)(1−ρ)​≤ρα2​⟺1+α(1−α)(1−ρ)​≤2ρ. Solving yields the breakpoint:

  ρ0=1−α3+α  \boxed{\;\rho_0 = \frac{1-\alpha}{3+\alpha}\;}ρ0​=3+α1−α​​ and for any ρ∈[ρ0,1)\rho\in[\rho_0,1)ρ∈[ρ0​,1), the solution with support {1}\{1\}{1} satisfies all KKT conditions. (Every node i≥3i\ge 3i≥3 has (Qx⋆)i=0(Qx^\star)_i=0(Qx⋆)i​=0 when only x1⋆≠0x_1^\star\neq 0x1⋆​=0, so their inequalities are trivially satisfied.)

So for all path lengths mmm, you get:

S⋆={1},∣S⋆∣=1≤1/ρ,whenever ρ∈[ρ0,1).S^\star = \{1\},\quad |S^\star|=1\le 1/\rho, \quad \text{whenever }\rho\in[\rho_0,1).S⋆={1},∣S⋆∣=1≤1/ρ,whenever ρ∈[ρ0​,1).

The margin is[edit | edit source]

γ=min⁡i∈I⋆(λi−∣∇f(x⋆)i∣).\gamma = \min_{i\in I^\star}(\lambda_i - |\nabla f(x^\star)_i|).γ=i∈I⋆min​(λi​−∣∇f(x⋆)i​∣). Here the tightest inactive coordinate is node 2 (the only one with nonzero gradient), so

γ=λ2−∣∇f(x⋆)2∣=ρα2−α(1−α)(1−ρ)(1+α)2.\gamma = \lambda_2 - |\nabla f(x^\star)_2| = \rho\alpha\sqrt{2} - \frac{\alpha(1-\alpha)(1-\rho)}{(1+\alpha)\sqrt{2}}.γ=λ2​−∣∇f(x⋆)2​∣=ρα2​−(1+α)2​α(1−α)(1−ρ)​. This simplifies neatly to a linear function of ρ−ρ0\rho-\rho_0ρ−ρ0​:

  γ=α(3+α)(1+α)2  (ρ−ρ0).  \boxed{\; \gamma = \frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}\;(\rho-\rho_0). \;}γ=(1+α)2​α(3+α)​(ρ−ρ0​).​ So:

  • At ρ=ρ0\rho=\rho_0ρ=ρ0​, γ=0\gamma=0γ=0.
  • For ρ=ρ0+δ\rho=\rho_0+\deltaρ=ρ0​+δ with tiny δ>0\delta>0δ>0, you get γ=Cδ\gamma = C\deltaγ=Cδ with C=α(3+α)(1+α)2C=\frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}C=(1+α)2​α(3+α)​.

Since ρ\sqrt{\rho}ρ​ is Θ(1)\Theta(1)Θ(1) here (because ρ0\rho_0ρ0​ is a constant depending on α\alphaα), taking δ\deltaδ extremely small makes γ\gammaγ much smaller than ρ\sqrt{\rho}ρ​.

Take α=0.15\alpha=0.15α=0.15. Then[edit | edit source]

ρ0=0.853.15≈0.26984,ρ0≈0.519.\rho_0 = \frac{0.85}{3.15}\approx 0.26984, \qquad \sqrt{\rho_0}\approx 0.519.ρ0​=3.150.85​≈0.26984,ρ0​​≈0.519. Now choose ρ=ρ0+10−6\rho=\rho_0+10^{-6}ρ=ρ0​+10−6. Then

γ≈0.15⋅3.151.15⋅1.414⋅10−6≈2.9×10−7≪0.519≈ρ.\gamma \approx \frac{0.15\cdot 3.15}{1.15\cdot 1.414}\cdot 10^{-6} \approx 2.9\times 10^{-7} \ll 0.519 \approx \sqrt{\rho}.γ≈1.15⋅1.4140.15⋅3.15​⋅10−6≈2.9×10−7≪0.519≈ρ​. So this path graph gives you a very explicit, closed-form “bad” case where

γ<ρ\gamma < \sqrt{\rho}γ<ρ​ by an arbitrarily large margin, while the true support is tiny (∣S⋆∣=1≤1/ρ|S^\star|=1\le 1/\rho∣S⋆∣=1≤1/ρ).

Algorithm parameters (t,βt,\betat,β, restart, etc.) don’t change γ\gammaγ: γ\gammaγ is a property of the instance (graph + α,ρ,s\alpha,\rho,sα,ρ,s) at the true optimizer. What algorithm parameters affect is how much transient “support thrashing” you might see before you enter the γ\gammaγ-neighborhood.[edit | edit source]

If you want, I can also give a path-prefix support construction (where S⋆={1,2,…,k}S^\star=\{1,2,\dots,k\}S⋆={1,2,…,k}) and show the breakpoint for node k+1k+1k+1, so you can make γ\gammaγ tiny while ∣S⋆∣≈k≈1/ρ|S^\star|\approx k\approx 1/\rho∣S⋆∣≈k≈1/ρ.

User: Convert my last two prompts and your answer into latex so I can append it to my current latex.[edit | edit source]

Convert my last two prompts and your answer into latex so I can append it to my current latex.

Assistant: Here is appendable LaTeX that turns the last two prompts + my two answers star + path constructions, and the “γ<ρ\gamma…[edit | edit source]

Here is appendable LaTeX that turns the last two prompts + my two answers (star + path constructions, and the “γ<ρ\gamma<\sqrt{\rho}γ<ρ​” point) into a clean section you can paste into your current note (it refers to the ℓ1\ell_1ℓ1​-regularized PageRank objective (3) in the COLT’22 open-problem note open-problem-fountoulakis22a

).

% ============================================================
\section{Bad instances where the margin $\gamma$ can be much smaller than $\sqrt{\rho}$}

We now record two explicit graph families showing that the strict-complementarity margin
\[
\gamma := \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr),
\qquad \lambda_i=\rho\alpha\sqrt{d_i},
\]
can be made \emph{arbitrarily small} (even $\gamma=0$) by choosing $\rho$ near a breakpoint of the
regularization path where an inactive KKT inequality becomes tight.
In particular, these instances satisfy $\gamma<\sqrt{\rho}$ for suitable choices of $\rho$.

\begin{remark}[Algorithmic parameters vs.\ $\gamma$]
The margin $\gamma$ is a property of the \emph{instance} (graph, seed $s$, and PageRank parameters $\alpha,\rho$),
since it is defined from the KKT slack at the \emph{true minimizer} $x^\star$.
Step size and momentum parameters of an accelerated method do \emph{not} change $\gamma$; they only affect
the transient behavior before the iterates enter a $\gamma$-neighborhood of $x^\star$.
\end{remark}

\subsection{Star graph (seed at the center)}

Let $G$ be a star on $m+1$ nodes with center node $c$ of degree $d_c=m$ and leaves $\ell$ of degree $1$.
Let the seed be $s=e_c$. Consider candidates of the form $x=x_c e_c$.

\begin{lemma}[Star graph breakpoint: $\gamma$ can be $0$]\label{lem:star_gamma_small}
Fix $\alpha\in(0,1]$ and $m\ge 1$ and define
\[
\rho_0 := \frac{1-\alpha}{2m}.
\]
For any $\rho\in[\rho_0,\,1/m)$, the unique minimizer has support $S^\star=\{c\}$, with
\[
x_c^\star = \frac{2\alpha(1-\rho m)}{(1+\alpha)\sqrt{m}}.
\]
Moreover, the KKT slack on any leaf $\ell$ equals
\[
\lambda_\ell - |\nabla f(x^\star)_\ell|
= \frac{2\alpha}{1+\alpha}\,(\rho-\rho_0),
\]
and thus at the breakpoint $\rho=\rho_0$ one has $\gamma=0<\sqrt{\rho_0}$.
\end{lemma}

\begin{proof}
For undirected graphs,
\[
Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2},
\qquad
\nabla f(x)=Qx-\alpha D^{-1/2}s.
\]
On the star, $Q_{cc}=\frac{1+\alpha}{2}$ and for each leaf $\ell$,
$Q_{\ell c}=Q_{c\ell}=-\frac{1-\alpha}{2}\frac{1}{\sqrt{m}}$.

Assume $x^\star=x_c^\star e_c$ with $x_c^\star>0$.
The active KKT condition at $c$ is
\[
0 = \nabla f(x^\star)_c + \lambda_c
= \frac{1+\alpha}{2}x_c^\star - \frac{\alpha}{\sqrt{m}} + \rho\alpha\sqrt{m},
\]
which yields the stated formula for $x_c^\star$ (and requires $\rho<1/m$ for positivity).
For a leaf $\ell$, the inactive KKT inequality is
\[
|\nabla f(x^\star)_\ell|
= |(Qx^\star)_\ell|
= \frac{1-\alpha}{2}\frac{x_c^\star}{\sqrt{m}}
\le \lambda_\ell=\rho\alpha,
\]
which reduces to $\rho\ge\rho_0$ after substituting $x_c^\star$.
Strong convexity of $F$ implies the KKT conditions certify the unique minimizer and its support.

Finally, the slack at a leaf is
\[
\lambda_\ell-|\nabla f(x^\star)_\ell|
= \rho\alpha - \frac{1-\alpha}{2}\frac{x_c^\star}{\sqrt{m}}
= \frac{2\alpha}{1+\alpha}(\rho-\rho_0),
\]
and since all inactive coordinates are leaves, this gives $\gamma$.
\end{proof}

\begin{remark}[Getting $\gamma<\sqrt{\rho}$ with $\gamma>0$ on the star]
Take $\rho=\rho_0+\delta$ with $\delta>0$.
Then $\gamma=\frac{2\alpha}{1+\alpha}\delta$ while $\rho=\rho_0+\delta>0$.
Choosing $\delta$ sufficiently small gives $\gamma<\sqrt{\rho}$ (and $\gamma$ can be made arbitrarily small).
\end{remark}

\subsection{Path graph (seed at an endpoint)}

Let $G=P_{m+1}$ be the path on nodes $1,2,\dots,m+1$ with edges $(i,i+1)$.
Let $s=e_1$ (seed at endpoint $1$). Here $d_1=d_{m+1}=1$ and $d_i=2$ for $2\le i\le m$.
Consider candidates of the form $x=x_1 e_1$.

\begin{lemma}[Path graph breakpoint: $\gamma$ can be $0$]\label{lem:path_gamma_small}
Fix $\alpha\in(0,1]$ and $m\ge 1$ and define
\[
\rho_0 := \frac{1-\alpha}{3+\alpha}.
\]
For any $\rho\in[\rho_0,\,1)$, the unique minimizer has support $S^\star=\{1\}$, with
\[
x_1^\star = \frac{2\alpha(1-\rho)}{1+\alpha}.
\]
Moreover, the KKT slack at node $2$ equals
\[
\lambda_2-|\nabla f(x^\star)_2|
= \frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}\,(\rho-\rho_0).
\]
In particular, at the breakpoint $\rho=\rho_0$ one has $\gamma=0<\sqrt{\rho_0}$.
\end{lemma}

\begin{proof}
As above, $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$.
Since $(1,2)\in E$ and $(D^{-1/2}AD^{-1/2})_{21}=1/\sqrt{1\cdot 2}=1/\sqrt{2}$, we have
$Q_{11}=\frac{1+\alpha}{2}$ and $Q_{21}=-\frac{1-\alpha}{2\sqrt{2}}$.

Assume $x^\star=x_1^\star e_1$ with $x_1^\star>0$.
The active KKT condition at node $1$ is
\[
0=\nabla f(x^\star)_1+\lambda_1
= \frac{1+\alpha}{2}x_1^\star - \alpha + \rho\alpha,
\]
which yields $x_1^\star=\frac{2\alpha(1-\rho)}{1+\alpha}$ (positive for $\rho<1$).
The inactive KKT inequality at node $2$ is
\[
|\nabla f(x^\star)_2|
= |(Qx^\star)_2|
= \frac{1-\alpha}{2\sqrt{2}}\,x_1^\star
\le \lambda_2=\rho\alpha\sqrt{2},
\]
which reduces to $\rho\ge\rho_0$ after substituting $x_1^\star$.
For nodes $i\ge 3$, $(Qx^\star)_i=0$ because $x^\star$ is supported only on node $1$ and $Q$ is sparse along edges,
hence their KKT inequalities hold trivially. Strong convexity implies uniqueness.

Finally,
\[
\lambda_2-|\nabla f(x^\star)_2|
= \rho\alpha\sqrt{2} - \frac{1-\alpha}{2\sqrt{2}}x_1^\star
= \frac{\alpha(3+\alpha)}{(1+\alpha)\sqrt{2}}(\rho-\rho_0),
\]
and at $\rho=\rho_0$ this slack is $0$, so $\gamma=0$.
\end{proof}

\begin{remark}[Getting $\gamma<\sqrt{\rho}$ with $\gamma>0$ on the path]
Take $\rho=\rho_0+\delta$ with $\delta>0$.
Then the slack at node $2$ is $O(\delta)$, hence $\gamma\le O(\delta)$, while $\sqrt{\rho}\approx \sqrt{\rho_0}>0$.
Choosing $\delta$ sufficiently small yields $\gamma<\sqrt{\rho}$ (and $\gamma$ can be made arbitrarily small).
\end{remark}
% ============================================================

User: Excellent![edit | edit source]

Excellent! Can you show me case where a particular bad graph example can result to γ < little-o( sqrt{ρ} )? don't produce latex yet. Show me your answer here.

Assistant: Yes.[edit | edit source]

Yes. You can get γ=o(ρ)\gamma = o(\sqrt{\rho})γ=o(ρ​) by taking a family of star graphs where ρ→0\rho\to 0ρ→0 (via the graph size) and choosing ρ\rhoρ extremely close to the support-change breakpoint so that the KKT inequality is almost tight on an inactive node.

I’ll spell it out cleanly.

Interpreting your statement in standard asymptotic terms:[edit | edit source]

This is stronger than just “γ<ρ\gamma<\sqrt{\rho}γ<ρ​” at a fixed instance.

Take the star Sm+1S_{m+1}Sm+1​ (center ccc, leaves ℓ1,…,ℓm\ell_1,\dots,\ell_mℓ1​,…,ℓm​), seed s=ecs=e_cs=ec​, and fix any constant α∈(0,1)\alpha\in(0,1)α∈(0,1).[edit | edit source]

For this instance (we already derived this earlier), there is a breakpoint

ρ0(m)=1−α2m\rho_0(m)=\frac{1-\alpha}{2m}ρ0​(m)=2m1−α​ such that for any ρ∈[ρ0(m), 1/m)\rho\in[\rho_0(m),\,1/m)ρ∈[ρ0​(m),1/m) the optimizer has support S⋆={c}S^\star=\{c\}S⋆={c}, and the margin satisfies

γ  =  2α1+α (ρ−ρ0(m)).\gamma \;=\; \frac{2\alpha}{1+\alpha}\,(\rho-\rho_0(m)).γ=1+α2α​(ρ−ρ0​(m)). So γ\gammaγ is exactly linear in how far ρ\rhoρ is above the breakpoint.

Let’s define a sequence ρm\rho_mρm​ by choosing a tiny offset above the breakpoint, for example:[edit | edit source]

ρm:=ρ0(m)+ρ0(m)2.\rho_m := \rho_0(m) + \rho_0(m)^2.ρm​:=ρ0​(m)+ρ0​(m)2. Then:

  1. ρm→0\rho_m\to 0ρm​→0 as m→∞m\to\inftym→∞ since ρ0(m)=Θ(1/m)\rho_0(m)=\Theta(1/m)ρ0​(m)=Θ(1/m).
  2. The margin is

γm=2α1+α (ρm−ρ0(m))=2α1+α ρ0(m)2=Θ ⁣(1m2).\gamma_m = \frac{2\alpha}{1+\alpha}\,(\rho_m-\rho_0(m)) = \frac{2\alpha}{1+\alpha}\,\rho_0(m)^2 = \Theta\!\left(\frac{1}{m^2}\right).γm​=1+α2α​(ρm​−ρ0​(m))=1+α2α​ρ0​(m)2=Θ(m21​).

  1. Meanwhile,

ρm=Θ ⁣(1m).\sqrt{\rho_m} = \Theta\!\left(\frac{1}{\sqrt{m}}\right).ρm​​=Θ(m​1​).

  1. Therefore

γmρm=Θ ⁣(1/m21/m)=Θ ⁣(1m3/2)⟶0.\frac{\gamma_m}{\sqrt{\rho_m}} = \Theta\!\left(\frac{1/m^2}{1/\sqrt{m}}\right) = \Theta\!\left(\frac{1}{m^{3/2}}\right) \longrightarrow 0.ρm​​γm​​=Θ(1/m​1/m2​)=Θ(m3/21​)⟶0. So indeed

  γm=o(ρm)  \boxed{\;\gamma_m = o(\sqrt{\rho_m})\;}γm​=o(ρm​​)​ for this explicit star-graph family.

Since ρm≈ρ0(m)\rho_m \approx \rho_0(m)ρm​≈ρ0​(m) for large mmm, and ρ0(m)=Θ(1/m)\rho_0(m)=\Theta(1/m)ρ0​(m)=Θ(1/m), the choice ρm=ρ0+ρ02\rho_m=\rho_0+\rho_0^2ρm​=ρ0​+ρ02​ gives[edit | edit source]

γm=Θ(ρ0(m)2)=Θ(ρm2).\gamma_m = \Theta(\rho_0(m)^2) = \Theta(\rho_m^2).γm​=Θ(ρ0​(m)2)=Θ(ρm2​). And ρ2=o(ρ)\rho^2 = o(\sqrt{\rho})ρ2=o(ρ​) as ρ→0\rho\to 0ρ→0 because

ρ2ρ=ρ3/2→0.\frac{\rho^2}{\sqrt{\rho}} = \rho^{3/2}\to 0.ρ​ρ2​=ρ3/2→0. So this construction actually gives the very clean scaling:

  γ=Θ(ρ2)⇒γ=o(ρ).  \boxed{\;\gamma = \Theta(\rho^2)\quad\Rightarrow\quad \gamma=o(\sqrt{\rho}).\;}γ=Θ(ρ2)⇒γ=o(ρ​).​

This shows you cannot hope for any worst-case lower bound like γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ​ (or even γ≳ρ\gamma \gtrsim \rhoγ≳ρ) that depends only on (α,ρ)(\alpha,\rho)(α,ρ): by tuning ρ\rhoρ close to a breakpoint on a suitable graph family, γ\gammaγ can be polynomially smaller than ρ\sqrt{\rho}ρ​, even when the true support is tiny (here ∣S⋆∣=1≤1/ρ|S^\star|=1\le 1/\rho∣S⋆∣=1≤1/ρ).[edit | edit source]

If you want, I can also give a variant where γ\gammaγ is super-polynomially small relative to ρ\sqrt{\rho}ρ​ (e.g., choose ρ−ρ0\rho-\rho_0ρ−ρ0​ exponentially small in mmm), but the γ=Θ(ρ2)\gamma=\Theta(\rho^2)γ=Θ(ρ2) example already proves γ=o(ρ)\gamma=o(\sqrt{\rho})γ=o(ρ​) cleanly.

User: There is something that I don't understand.[edit | edit source]

There is something that I don't understand. Earlier you proved that "\begin{equation}\label{eq:workeps_ppr} \mathrm{Work}(N_\varepsilon) = O\!\left( \frac{1}{\rho\sqrt{\alpha}}\log\frac{1}{\varepsilon} \;+\; \frac{1}{\gamma^2\sqrt{\alpha}} \right), \end{equation}"

So, based on what you said, for the star graph if ρ = ρ_0, thus γ=0. Does that mean that the algorithm needs infinite work? Is there something missing in your proof?

Assistant: No — γ0\gamma0γ0 does not mean the algorithm needs infinite work for any fixed accuracy ε0\varepsilon0ε0.[edit | edit source]

No — γ=0\gamma=0γ=0 does not mean the algorithm needs infinite work for any fixed accuracy ε>0\varepsilon>0ε>0. What it means is: the particular “local work” bound you’re quoting was proved only under the strict margin assumption γ>0\gamma>0γ>0, and if you plug in γ=0\gamma=0γ=0 you are applying the theorem outside its validity range. The “+∞+\infty+∞” you get is the bound becoming vacuous, not the algorithm becoming non-terminating.

Here’s exactly what is going on.

In our derivation, the key counting step was:[edit | edit source]

  • if an inactive coordinate i∈I⋆i\in I^\stari∈I⋆ becomes nonzero at an iterate,
  • then the prox input must cross the threshold by at least tγt\gammatγ,
  • which leads to a bound like

∣Ak∣  ≤  ∥u(yk)−u(x⋆)∥22t2γ2  ≤  (1+tLtγ)2∥yk−x⋆∥22.|A_k| \;\le\; \frac{\|u(y_k)-u(x^\star)\|_2^2}{t^2\gamma^2} \;\le\;\left(\frac{1+tL}{t\gamma}\right)^2\|y_k-x^\star\|_2^2.∣Ak​∣≤t2γ2∥u(yk​)−u(x⋆)∥22​​≤(tγ1+tL​)2∥yk​−x⋆∥22​. That inequality divides by γ2\gamma^2γ2. If γ=0\gamma=0γ=0, the statement “each spurious active coordinate forces a jump of at least tγt\gammatγ” becomes “forces a jump of at least 0,” which gives no information. So the proof technique cannot upper bound the number of spurious activations, and the resulting constant 1/γ21/\gamma^21/γ2 blows up.

So what fails at γ=0\gamma=0γ=0 is finite identification / stability of the optimal inactive set, not convergence of APG itself.

γ=0\gamma=0γ=0 means there exists an inactive coordinate i∈I⋆i\in I^\stari∈I⋆ such that the KKT inequality is tight:[edit | edit source]

∣∇f(x⋆)i∣=λi.|\nabla f(x^\star)_i| = \lambda_i.∣∇f(x⋆)i​∣=λi​. Equivalently, at the optimum the prox input is exactly at the soft-threshold boundary:

∣u(x⋆)i∣=tλi,u(x):=x−t∇f(x).|u(x^\star)_i| = t\lambda_i,\quad u(x):=x-t\nabla f(x).∣u(x⋆)i​∣=tλi​,u(x):=x−t∇f(x). When you are exactly on that boundary, then arbitrarily small perturbations of yyy around x⋆x^\starx⋆ can flip that coordinate between

  • returning 0 (inactive), and
  • returning a tiny nonzero (active),

because soft-thresholding switches at equality.

Accelerated methods generate extrapolated points yky_kyk​ that can oscillate around x⋆x^\starx⋆, so there is no reason (in worst case) that such “tight” coordinates must become permanently zero after a finite time. In other words:

  • γ>0\gamma>0γ>0 ⇒ there is a buffer zone in which inactive coordinates are guaranteed to stay inactive, hence finite identification, hence the spurious-work term is a constant (independent of ε\varepsilonε).
  • γ=0\gamma=0γ=0 ⇒ there is no buffer zone, so spurious coordinates can keep flickering for as long as you run; the spurious work is no longer a constant independent of ε\varepsilonε.

That’s exactly why the margin matters.

For any fixed ε>0\varepsilon>0ε>0: no.[edit | edit source]

Accelerated proximal gradient still has the iteration complexity Nε=O(1/αlog⁡(1/ε))N_\varepsilon = O(\sqrt{1/\alpha}\log(1/\varepsilon))Nε​=O(1/α​log(1/ε)) for strongly convex composite objectives, regardless of γ\gammaγ. This is the standard statement you quoted in the open-problem note. open-problem-fountoulakis22a

What you lose at γ=0\gamma=0γ=0 is a guarantee that each iteration can be implemented with local cost O~(1/ρ)\tilde O(1/\rho)O~(1/ρ) (or that the cumulative support outside S⋆S^\starS⋆ is bounded by a constant). In the worst case you may have to fall back to the pessimistic bound you already mention in your note:

Work(Nε)  ≤  O ⁣(n1αlog⁡1ε),\text{Work}(N_\varepsilon)\;\le\; O\!\left(n\sqrt{\frac{1}{\alpha}}\log\frac{1}{\varepsilon}\right),Work(Nε​)≤O(nα1​​logε1​), because without margin you cannot prove the iterate supports stay small. open-problem-fountoulakis22a

So: finite work, but potentially non-local.

Two different “limits” get conflated:[edit | edit source]

- You run Nε<∞N_\varepsilon<\inftyNε​<∞ iterations.[edit | edit source]
  • Work is at most nNε<∞nN_\varepsilon<\inftynNε​<∞.
  • Nothing is infinite.
- Nε→∞N_\varepsilon\to\inftyNε​→∞ for any method.[edit | edit source]
  • If γ>0\gamma>0γ>0, we proved that the spurious support contribution stays bounded even as ε→0\varepsilon\to 0ε→0, because identification happens in finite time.
  • If γ=0\gamma=0γ=0, you can have no finite identification time, so spurious activations can persist for arbitrarily many iterations; the spurious contribution can grow with NεN_\varepsilonNε​. In that sense, a bound that tries to be uniform “independent of ε\varepsilonε” will necessarily break.

That is exactly what the 1γ2α\frac{1}{\gamma^2\sqrt{\alpha}}γ2α​1​ term is signaling.

Not a missing step — it’s a missing assumption. The bound you quoted is a conditional theorem:[edit | edit source]

  • It is correct under Assumption γ>0\gamma>0γ>0.
  • At γ=0\gamma=0γ=0, that assumption is false, and the bound is not meant to apply.

More strongly: the blow-up is not an artifact; it reflects a real phenomenon — at γ=0\gamma=0γ=0 you generally cannot guarantee finite-time identification or an ε\varepsilonε-independent bound on cumulative spurious support for inertial/accelerated proximal schemes.

If you want, I can also show (still without LaTeX) how to state an alternative runtime bound valid even when γ=0\gamma=0γ=0: it will necessarily depend on nnn (or on how many “tight” inactive coordinates there are), matching the pessimistic O(n1/αlog⁡(1/ε))O(n\sqrt{1/\alpha}\log(1/\varepsilon))O(n1/α​log(1/ε)) running time in your open-problem statement. open-problem-fountoulakis22a

User: Can you convert this into an independent latex section that I can append to my code?[edit | edit source]

Can you convert this into an independent latex section that I can append to my code?

Assistant: Append the following section it clarifies why plugging γ0\gamma0γ0 into the γ0\gamma0γ0 bound is invalid, and gives …[edit | edit source]

Append the following section (it clarifies why plugging γ=0\gamma=0γ=0 into the γ>0\gamma>0γ>0 bound is invalid, and gives the correct “fallback” runtime bound that always holds). open-problem-fountoulakis22a

% ============================================================
\section{When the margin vanishes: why the $1/\gamma^2$ term blows up (and what it does \emph{not} mean)}
\label{sec:gamma_zero}

A potential point of confusion is the appearance of the term $1/\gamma^2$ in bounds such as
\begin{equation}\label{eq:workeps_ppr_repeat}
\mathrm{Work}(N_\varepsilon)
=
O\!\left(
\frac{1}{\rho\sqrt{\alpha}}\log\frac{1}{\varepsilon}
\;+\;
\frac{1}{\gamma^2\sqrt{\alpha}}
\right),
\end{equation}
and what happens on instances where $\gamma=0$ (e.g., the star/path breakpoint examples).

\subsection{The key point: \eqref{eq:workeps_ppr_repeat} is a conditional bound (it assumes $\gamma>0$)}

Recall the definition of the (strict complementarity) margin:
\[
\gamma := \min_{i\in I^\star}\left(\lambda_i - |\nabla f(x^\star)_i|\right),
\qquad
\lambda_i=\rho\alpha\sqrt{d_i}.
\]
All of the `<code>finite identification'' and </code><code>$\varepsilon$-independent spurious-support'' bounds were proved
\emph{under the assumption $\gamma>0$}. The factor $1/\gamma^2$ arises by dividing by the minimum slack
in the inactive KKT inequalities.

If $\gamma=0$, then there exists an inactive index $i\in I^\star$ with
\[
|\nabla f(x^\star)_i|=\lambda_i,
\]
i.e., that coordinate lies \emph{exactly} on the soft-thresholding boundary at the optimum.
In this case, there is no neighborhood of $x^\star$ in which the proximal-gradient update is guaranteed
to keep all inactive coordinates at zero: arbitrarily small perturbations of the extrapolated point $y$
can flip such a tight coordinate between $0$ and a tiny nonzero value. Consequently, one cannot in general
guarantee \emph{finite-time identification} of the optimal inactive set, and the proofs yielding
$\varepsilon$-independent spurious-support bounds do not apply.

\paragraph{Conclusion.}
Setting $\gamma=0$ in \eqref{eq:workeps_ppr_repeat} does \emph{not} imply the algorithm requires infinite work;
it only means that \eqref{eq:workeps_ppr_repeat} becomes \emph{vacuous} when used outside its assumptions.

\subsection{What remains true when $\gamma=0$: finite work for any fixed accuracy}

Even when $\gamma=0$, accelerated proximal-gradient methods still enjoy the standard accelerated \emph{iteration}
complexity for strongly convex composite optimization. In particular, under $\mu$-strong convexity of $F$
and $L$-smoothness of $f$, there exists $q\in(0,1)$ (typically $q\approx 1-\sqrt{\mu/L}$) such that
\[
F(x_k)-F(x^\star)\le \Delta_0\,q^k,
\qquad \Delta_0:=F(x_0)-F(x^\star).
\]
Therefore, for any target $\varepsilon>0$, an $\varepsilon$-accurate iterate is obtained after
\[
N_\varepsilon = O\!\left(\frac{\log(\Delta_0/\varepsilon)}{\log(1/q)}\right)
=
O\!\left(\sqrt{\frac{L}{\mu}}\log\frac{\Delta_0}{\varepsilon}\right)
\]
iterations.

However, without a positive margin, we cannot bound the supports during the transient independently of $n$.
A worst-case bound is obtained by charging each iteration the full dimension.

\begin{corollary}[Always-valid fallback work bound (no margin needed)]\label{cor:fallback_work}
Assume only that $f$ is $L$-smooth and $F=f+g$ is $\mu$-strongly convex, and run an accelerated
proximal-gradient method for $N_\varepsilon$ iterations.
Then for any $\varepsilon>0$,
\[
\mathrm{Work}(N_\varepsilon)
\;\le\;
O\!\left(n\,N_\varepsilon\right)
=
O\!\left(n\sqrt{\frac{L}{\mu}}\log\frac{\Delta_0}{\varepsilon}\right),
\]
independent of the margin $\gamma$.
In the $\ell_1$-regularized PageRank setting (undirected graph), $L\le 1$ and $\mu=\alpha$, hence
\[
\mathrm{Work}(N_\varepsilon)
=
O\!\left(\frac{n}{\sqrt{\alpha}}\log\frac{\Delta_0}{\varepsilon}\right).
\]
\end{corollary}

\begin{remark}[How to interpret the role of $\gamma$]
The margin $\gamma>0$ is precisely what upgrades the fallback bound above
(which scales with $n$) to a \emph{local} bound that depends on $1/\rho$ and is independent of $n$.
When $\gamma=0$, locality cannot be certified by this approach: tight inactive coordinates can keep
</code>`flickering'' and may prevent $\varepsilon$-independent bounds on the cumulative spurious supports.
% ============================================================

User: Is there any reasonable assumption that we could make about the structure of the graph around the seed node and ρ such …[edit | edit source]

Is there any reasonable assumption that we could make about the structure of the graph around the seed node and ρ such that γ is bigger than sqrt(ρ) or of that order?

Assistant: There are two very different issues mixed into “can we assume γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ​?”:[edit | edit source]

There are two very different issues mixed into “can we assume γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ​?”:

  1. A hard upper bound coming from the penalty weights in (3), and
  2. A nondegeneracy / “don’t sit at a breakpoint” condition (strict complementarity).

Let me separate them.

For your objective (3), the ℓ1\ell_1ℓ1​ weights are[edit | edit source]

λi  =  ραdi.()\lambda_i \;=\; \rho\alpha\sqrt{d_i}. \tag{}λi​=ραdi​​.(*) and for every inactive coordinate i∈I⋆i\in I^\stari∈I⋆,

∣∇f(x⋆)i∣≤λi,γ=min⁡i∈I⋆(λi−∣∇f(x⋆)i∣).()|\nabla f(x^\star)_i|\le \lambda_i, \qquad \gamma = \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr). \tag{}∣∇f(x⋆)i​∣≤λi​,γ=i∈I⋆min​(λi​−∣∇f(x⋆)i​∣).(**) So automatically

0≤γ≤min⁡i∈I⋆λi≤ραmin⁡i∈I⋆di≤ραdmin⁡,(1)0 \le \gamma \le \min_{i\in I^\star}\lambda_i \le \rho\alpha\min_{i\in I^\star}\sqrt{d_i} \le \rho\alpha\sqrt{d_{\min}}, \tag{1}0≤γ≤i∈I⋆min​λi​≤ραi∈I⋆min​di​​≤ραdmin​​,(1) where dmin⁡d_{\min}dmin​ is the global minimum degree (or just the minimum degree among inactive nodes). This is immediate from the definition. open-problem-fountoulakis22a

If your graph has any inactive node with bounded degree (say degree 1 or 2), then[edit | edit source]

γ≤O(ρ).\gamma \le O(\rho).γ≤O(ρ). And when ρ→0\rho\to 0ρ→0, O(ρ)=o(ρ)O(\rho)=o(\sqrt{\rho})O(ρ)=o(ρ​). So on typical sparse graphs with leaves/whiskers, no structural assumption “around the seed” can force γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ​, because the bound (1) is global and purely degree-based.

This is why the star/path families are so effective as counterexamples: they contain low-degree inactive nodes, forcing γ\gammaγ to be at most O(ρ)O(\rho)O(ρ), and (by tuning ρ\rhoρ near breakpoints) even smaller.

To even make γ∼ρ\gamma\sim \sqrt{\rho}γ∼ρ​ possible when ρ\rhoρ is small, you need λi\lambda_iλi​ itself to be Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ​) for the relevant inactive coordinates. Using (∗)(*)(∗), that means:[edit | edit source]

ραdi  ≳  ρ⟹di  ≳  1α2ρ.(2)\rho\alpha\sqrt{d_i} \;\gtrsim\; \sqrt{\rho} \quad\Longrightarrow\quad d_i \;\gtrsim\; \frac{1}{\alpha^2\rho}. \tag{2}ραdi​​≳ρ​⟹di​≳α2ρ1​.(2) So a necessary (and essentially unavoidable) type of assumption is:

A clean version is:[edit | edit source]
  • either the graph has minimum degree dmin⁡≳1/(α2ρ)d_{\min} \gtrsim 1/(\alpha^2\rho)dmin​≳1/(α2ρ),
  • or (more weakly) every node that ends up inactive has degree at least c/(α2ρ)c/(\alpha^2\rho)c/(α2ρ).

This rules out leaves/paths/stars-with-leaves-outside-support, and it matches the algebra: then

min⁡i∈I⋆λi  ≥  ραc/(α2ρ)  =  cρ.\min_{i\in I^\star}\lambda_i \;\ge\; \rho\alpha\sqrt{c/(\alpha^2\rho)} \;=\; \sqrt{c\rho}.i∈I⋆min​λi​≥ραc/(α2ρ)​=cρ​. Is this “reasonable”?

  • On dense graphs, yes.
  • On many sparse real-world networks, no—unless you pre-process (e.g., work on a core where minimum degree is large).

Even if λi\lambda_iλi​ is large, γ\gammaγ can still be tiny if some inactive coordinate is nearly tight:[edit | edit source]

∣∇f(x⋆)i∣≈λi.|\nabla f(x^\star)_i|\approx \lambda_i.∣∇f(x⋆)i​∣≈λi​. That’s exactly what happens at (or near) breakpoints of the regularization path.

So you need a second assumption:

A natural version is a relative slack condition:[edit | edit source]

∣∇f(x⋆)i∣≤(1−η)λifor all i∈I⋆,|\nabla f(x^\star)_i| \le (1-\eta)\lambda_i \quad\text{for all } i\in I^\star,∣∇f(x⋆)i​∣≤(1−η)λi​for all i∈I⋆, for some fixed η∈(0,1)\eta\in(0,1)η∈(0,1). This is stronger than just γ>0\gamma>0γ>0; it forces a uniform fraction of slack.

Then

γ=min⁡i∈I⋆(λi−∣∇f(x⋆)i∣)≥ηmin⁡i∈I⋆λi.(3)\gamma = \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) \ge \eta \min_{i\in I^\star}\lambda_i. \tag{3}γ=i∈I⋆min​(λi​−∣∇f(x⋆)i​∣)≥ηi∈I⋆min​λi​.(3) Combine (3) with the degree condition (2) and you get

γ  ≳  ηρ(up to α-constants).(4)\gamma \;\gtrsim\; \eta \sqrt{\rho} \quad (\text{up to }\alpha\text{-constants}). \tag{4}γ≳ηρ​(up to α-constants).(4) Is Assumption B “reasonable”?

  • Deterministically in worst case: no (adversarial ρ\rhoρ can always sit near a breakpoint; your star/path show this).
  • Practically / statistically: often yes, because exact tightness is a codimension-1 event. If ρ\rhoρ is chosen from a coarse grid or with small random jitter, hitting a breakpoint “too closely” becomes unlikely.

A very concrete “algorithmic” way to justify B is:

  • choose ρ\rhoρ on a multiplicative grid (e.g., ρ=2−j\rho=2^{-j}ρ=2−j) and/or
  • add a tiny random perturbation to ρ\rhoρ (or to edge weights), then with high probability you are not extremely close to a breakpoint, hence γ\gammaγ is not extremely small.

###[edit | edit source]

Then γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ​ is not a reasonable target in general, because γ≤O(ρ)\gamma\le O(\rho)γ≤O(ρ) already from (1). The best you can hope for is γ=Θ(ρ)\gamma=\Theta(\rho)γ=Θ(ρ) (and even that needs a nondegeneracy buffer).

If you assume something like:[edit | edit source]
  • all nodes outside the seed’s local region / outside the optimal support have degree ≳1/ρ\gtrsim 1/\rho≳1/ρ, and
  • ρ\rhoρ is not tuned near a breakpoint (relative slack η\etaη),

then γ\gammaγ can be of order ρ\sqrt{\rho}ρ​ (more precisely Θ(αρ)\Theta(\alpha\sqrt{\rho})Θ(αρ​) up to constants), by the simple chain (3)+(2).

If you tell me which regime you care about (typical local clustering on sparse graphs, or a dense-core model, or a probabilistic “ρ\rhoρ not adversarial” model), I can write down a crisp assumption set and show the exact inequality chain that yields γ≥c αρ\gamma \ge c\,\alpha\sqrt{\rho}γ≥cαρ​ (or explain why it’s impossible in that regime).

User: Can you provide a theorem for a sufficient condition for γ to be Osqrtρ.[edit | edit source]

Can you provide a theorem for a sufficient condition for γ to be O(sqrt(ρ)). You will possibly need to assume a target cluster with some conductance which is also well connected internally.

Assistant: Yes—there’s a clean way to package this as a primal–dual KKT certificate theorem plus a graph-structural corollary.[edit | edit source]

Yes—there’s a clean way to package this as a primal–dual (KKT certificate) theorem plus a graph-structural corollary. The main idea is:

  • pick a “target cluster” CCC containing the seed,
  • show the restricted minimizer supported on CCC is also the global minimizer (via strict dual feasibility on V∖CV\setminus CV∖C),
  • then the margin is at least the dual-feasibility slack, which you can lower bound by something like Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ​) if (i) outside degrees are large enough so λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi​=ραdi​​ is ≳ρ\gtrsim \sqrt{\rho}≳ρ​, and (ii) boundary coupling is small (low conductance / dispersed boundary) and CCC is internally well-conditioned.

I’ll write a theorem in a form that is actually checkable and then interpret it in cluster terms.

We’re in the ℓ1\ell_1ℓ1​-regularized PPR problem (your eq. (3)) open-problem-fountoulakis22a[edit | edit source]

min⁡x∈Rn  F(x)=f(x)+g(x),f(x)=12x⊤Qx−αx⊤D−1/2s,g(x)=∑i=1nλi∣xi∣,\min_{x\in\mathbb{R}^n}\;F(x)= f(x)+g(x),\quad f(x)=\tfrac12 x^\top Qx-\alpha x^\top D^{-1/2}s,\quad g(x)=\sum_{i=1}^n \lambda_i |x_i|,x∈Rnmin​F(x)=f(x)+g(x),f(x)=21​x⊤Qx−αx⊤D−1/2s,g(x)=i=1∑n​λi​∣xi​∣, with

Q=αI+1−α2L,λi=ραdi,b:=αD−1/2s.Q=\alpha I+\tfrac{1-\alpha}{2}L,\qquad \lambda_i=\rho\alpha\sqrt{d_i},\qquad b:=\alpha D^{-1/2}s.Q=αI+21−α​L,λi​=ραdi​​,b:=αD−1/2s. Let x⋆x^\starx⋆ be the unique minimizer and I⋆={i:xi⋆=0}I^\star=\{i:x_i^\star=0\}I⋆={i:xi⋆​=0}. The margin is

γ:=min⁡i∈I⋆(λi−∣∇f(x⋆)i∣)=min⁡i∈I⋆(λi−∣(Qx⋆−b)i∣).\gamma := \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) = \min_{i\in I^\star}\bigl(\lambda_i-| (Qx^\star-b)_i|\bigr).γ:=i∈I⋆min​(λi​−∣∇f(x⋆)i​∣)=i∈I⋆min​(λi​−∣(Qx⋆−b)i​∣).

###[edit | edit source]

Let C⊆VC\subseteq VC⊆V contain the seed node vvv (i.e. s=evs=e_vs=ev​ and v∈Cv\in Cv∈C). Define the complement Cˉ:=V∖C\bar C:=V\setminus CCˉ:=V∖C. Partition QQQ and bbb into blocks (C,Cˉ)(C,\bar C)(C,Cˉ).

Assume there exists a vector x^∈Rn\hat x\in\mathbb{R}^nx^∈Rn supported on CCC (i.e. x^Cˉ=0\hat x_{\bar C}=0x^Cˉ​=0) such that:

  1. Stationarity on CCC with a known sign pattern. There is a sign vector σC∈{−1,+1}∣C∣\sigma_C\in\{-1,+1\}^{|C|}σC​∈{−1,+1}∣C∣ such that QCCx^C−bC+λC∘σC=0.Q_{CC}\hat x_C - b_C + \lambda_C\circ \sigma_C = 0.QCC​x^C​−bC​+λC​∘σC​=0. (In the PPR setting one often expects x^C≥0\hat x_C\ge 0x^C​≥0, so σC=1\sigma_C=\mathbf{1}σC​=1.)
  2. Strict dual feasibility off CCC (a uniform slack η∈(0,1)\eta\in(0,1)η∈(0,1)). For all i∈Cˉi\in\bar Ci∈Cˉ, ∣(QCˉCx^C−bCˉ)i∣≤(1−η)λi.|(Q_{\bar C C}\hat x_C - b_{\bar C})_i| \le (1-\eta)\lambda_i.∣(QCˉC​x^C​−bCˉ​)i​∣≤(1−η)λi​.

Then:

  • x^\hat xx^ is the unique global minimizer: x^=x⋆\hat x=x^\starx^=x⋆, hence \supp(x⋆)⊆C\supp(x^\star)\subseteq C\supp(x⋆)⊆C.
  • The margin satisfies the explicit lower bound γ  ≥  ηmin⁡i∈Cˉλi  =  η ραmin⁡i∈Cˉdi.\gamma \;\ge\; \eta\min_{i\in\bar C}\lambda_i \;=\; \eta\,\rho\alpha\min_{i\in\bar C}\sqrt{d_i}.γ≥ηi∈Cˉmin​λi​=ηραi∈Cˉmin​di​​.

In particular, if the outside minimum degree satisfies

min⁡i∈Cˉdi  ≥  c0α2ρfor some c0>0,\min_{i\in\bar C} d_i \;\ge\; \frac{c_0}{\alpha^2\rho} \quad\text{for some }c_0>0,i∈Cˉmin​di​≥α2ρc0​​for some c0​>0, then

γ  ≥  η ραc0α2ρ  =  ηc0 ρ  =  Ω(ρ).\gamma \;\ge\; \eta\,\rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}} \;=\;\eta\sqrt{c_0}\,\sqrt{\rho} \;=\;\Omega(\sqrt{\rho}).γ≥ηραα2ρc0​​​=ηc0​​ρ​=Ω(ρ​). Why this theorem is the right abstraction. It’s the exact analogue of a standard Lasso “primal–dual witness” / strict dual feasibility condition: once you have strict feasibility on the complement, the support is stable and you get a quantitative margin.

The theorem above is “certificate-based.” To connect it to conductance and “internally well connected,” you want easy-to-check conditions that imply the strict dual feasibility inequality.[edit | edit source]

Here is a sufficient (strong but explicit) set of structural assumptions that imply the certificate.

Let C⊆VC\subseteq VC⊆V contain the seed vvv, and suppose:[edit | edit source]

(A) Internal connectivity / conditioning on CCC. The principal submatrix QCCQ_{CC}QCC​ satisfies

λmin⁡(QCC)≥μC,\lambda_{\min}(Q_{CC}) \ge \mu_C,λmin​(QCC​)≥μC​, for some μC>0\mu_C>0μC​>0. A natural way to ensure this is to assume CCC has a positive Dirichlet spectral gap for the normalized Laplacian (or, more informally, CCC is an internal expander); then μC\mu_CμC​ is bounded below by α+1−α2λmin⁡(LCD)\alpha+\tfrac{1-\alpha}{2}\lambda_{\min}(L^D_C)α+21−α​λmin​(LCD​).

(B) Low external conductance and “dispersed boundary.” Let cut(C,Cˉ)\mathrm{cut}(C,\bar C)cut(C,Cˉ) be the number of edges crossing (C,Cˉ)(C,\bar C)(C,Cˉ). Assume

ϕ(C)=cut(C,Cˉ)vol(C)≤coutρandmax⁡i∈Cˉdeg⁡C(i)≤b,\phi(C)=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)} \le c_{\mathrm{out}}\rho \quad\text{and}\quad \max_{i\in\bar C}\deg_C(i)\le b,ϕ(C)=vol(C)cut(C,Cˉ)​≤cout​ρandi∈Cˉmax​degC​(i)≤b, where deg⁡C(i)\deg_C(i)degC​(i) is the number of neighbors of iii inside CCC, and bbb is a modest constant. (Heuristically: not only is the cut small, it’s not concentrated onto a single outside low-degree node.)

(C) Degree lower bounds.

dj≥dC,min⁡  ∀j∈C,di≥dout  ∀i∈Cˉ,dout≥c0α2ρ.d_j \ge d_{C,\min}\ \ \forall j\in C, \qquad d_i \ge d_{\mathrm{out}}\ \ \forall i\in\bar C, \qquad d_{\mathrm{out}} \ge \frac{c_0}{\alpha^2\rho}.dj​≥dC,min​  ∀j∈C,di​≥dout​  ∀i∈Cˉ,dout​≥α2ρc0​​. (D) Choose ρ\rhoρ so the restricted solution is nonnegative (and not at a breakpoint). Define the candidate supported on CCC (assuming positivity)

x^C:=QCC−1(bC−λC),x^Cˉ:=0,\hat x_C := Q_{CC}^{-1}(b_C-\lambda_C), \qquad \hat x_{\bar C}:=0,x^C​:=QCC−1​(bC​−λC​),x^Cˉ​:=0, and assume x^C≥0\hat x_C\ge 0x^C​≥0 componentwise, and that the following “boundary domination” inequality holds for some η∈(0,1)\eta\in(0,1)η∈(0,1):

1−α2bdoutdC,min⁡⏟(boundary coupling)⋅∥x^C∥2⏟(signal on C) ≤ (1−η)ραdout⏟min⁡i∈Cˉλi.\underbrace{\frac{1-\alpha}{2}\sqrt{\frac{b}{d_{\mathrm{out}}d_{C,\min}}}}_{\text{(boundary coupling)}} \cdot \underbrace{\|\hat x_C\|_2}_{\text{(signal on }C\text{)}} \ \le\ (1-\eta)\underbrace{\rho\alpha\sqrt{d_{\mathrm{out}}}}_{\min_{i\in\bar C}\lambda_i}.(boundary coupling)21−α​dout​dC,min​b​​​​⋅(signal on C)∥x^C​∥2​​​ ≤ (1−η)mini∈Cˉ​λi​ραdout​​​​. Then the strict dual feasibility condition holds, so x^=x⋆\hat x=x^\starx^=x⋆ and

γ≥η ραdout≥ηc0ρ.\gamma \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}} \ge \eta\sqrt{c_0}\sqrt{\rho}.γ≥ηραdout​​≥ηc0​​ρ​. Interpretation.

  • Low conductance keeps the interaction across (C,Cˉ)(C,\bar C)(C,Cˉ) limited.
  • Internal connectivity keeps QCCQ_{CC}QCC​ well-conditioned, preventing ∥x^C∥\|\hat x_C\|∥x^C​∥ from blowing up.
  • High degree outside ensures the thresholds λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi​=ραdi​​ are at least Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ​) (this is necessary for γ\gammaγ to be Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ​), because γ≤min⁡i∈I⋆λi\gamma\le \min_{i\in I^\star}\lambda_iγ≤mini∈I⋆​λi​).
  • Dispersed boundary prevents a single outside vertex from having a large coupling QiCQ_{iC}QiC​.

- This kind of theorem is plausible and standard (it’s exactly the strict dual feasibility logic), but it does require assumptions that are stronger than just “small conductance”. In particular, without something like “no low-degree boundary whiskers” (or at least “inactive nodes have large degree”), you cannot hope for γ≳ρ\gamma\gtrsim \sqrt{\rho}γ≳ρ​, because γ≤min⁡i∈I⋆λi=ραmin⁡i∈I⋆di\gamma \le \min_{i\in I^\star}\lambda_i = \rho\alpha\min_{i\in I^\star}\sqrt{d_i}γ≤mini∈I⋆​λi​=ραmini∈I⋆​di​​.[edit | edit source]

  • If you’re thinking in the usual sparse-graph local clustering regime (many degree-1/2 nodes), the best “natural” order you can hope for is more like γ=Θ(ρ)\gamma=\Theta(\rho)γ=Θ(ρ) (up to constants), not Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ​).

If you want, I can convert the theorem + corollary into a compact LaTeX section and align notation precisely with your earlier appendices (including a short proof: it’s a one-page KKT/primal–dual witness argument).

User: Excellent, can you provide a latex section with this sufficient theorem and the corresponding proof?[edit | edit source]

Excellent, can you provide a latex section with this sufficient theorem and the corresponding proof? Don't skip steps, provide all details.

Assistant: Below is an appendable LaTeX section giving a sufficient cluster certificate theorem and a fully detailed proof that it…[edit | edit source]

Below is an appendable LaTeX section giving a sufficient cluster certificate theorem and a fully detailed proof that it implies (i) the global optimizer is supported on the cluster and (ii) the margin satisfies γ=Ω(ρ)\gamma=\Omega(\sqrt{\rho})γ=Ω(ρ​) under a simple outside-degree condition. It is written for the ℓ1\ell_1ℓ1​-regularized PageRank objective (3) in your COLT’22 open-problem note. open-problem-fountoulakis22a

% ============================================================
\section{A primal--dual cluster certificate implying a $\sqrt{\rho}$-scale margin}
\label{sec:cluster_certificate_margin}

This section provides a \emph{sufficient} condition ensuring the strict-complementarity margin
\[
\gamma := \min_{i\in I^\star}\left(\lambda_i - |\nabla f(x^\star)_i|\right)
\]
is bounded below by a quantity of order $\sqrt{\rho}$ (in the PageRank specialization).
The result is stated and proved in a general composite form and then specialized to the
$\ell_1$-regularized PageRank objective~(3).

\subsection{General composite model and subgradients}

Consider the composite optimization problem
\begin{equation}\label{eq:comp_model_cert}
\min_{x\in\mathbb{R}^n} F(x) := f(x) + g(x),
\qquad
g(x) := \sum_{i=1}^n \lambda_i |x_i|,
\quad \lambda_i>0.
\end{equation}
Assume $f$ is convex and differentiable. (In our PageRank case, $f$ is quadratic and strongly convex.)

We recall the coordinatewise subdifferential of the weighted $\ell_1$ term.

\begin{lemma}[Subdifferential of weighted $\ell_1$]\label{lem:subdiff_l1_weighted}
Let $g(x)=\sum_{i=1}^n \lambda_i |x_i|$ with $\lambda_i>0$. Then
\[
\partial g(x) = \left\{z\in\mathbb{R}^n:\;
z_i=
\begin{cases}
\lambda_i\,\mathrm{sign}(x_i), & x_i\neq 0,\\[2mm]
\theta_i,\;\; \theta_i\in[-\lambda_i,\lambda_i], & x_i=0,
\end{cases}
\;\; \forall i
\right\}.
\]
\end{lemma}

\begin{proof}
Since $g$ is separable, $g(x)=\sum_i g_i(x_i)$ with $g_i(t)=\lambda_i|t|$.
The subdifferential of $|t|$ is $\partial |t|=\{\mathrm{sign}(t)\}$ for $t\neq 0$ and $\partial |0|=[-1,1]$.
Thus $\partial g_i(t)=\lambda_i\partial|t|$, yielding
$\partial g_i(t)=\{\lambda_i\mathrm{sign}(t)\}$ for $t\neq 0$ and $\partial g_i(0)=[-\lambda_i,\lambda_i]$.
Taking the Cartesian product across coordinates gives the stated formula.
\end{proof}

We also recall the standard first-order optimality characterization for convex minimization.

\begin{lemma}[First-order optimality for convex composite problems]\label{lem:optimality_convex}
Assume $f$ is convex and differentiable and $g$ is convex. Then a point $\bar x$ minimizes $F=f+g$
if and only if
\[
0 \in \partial F(\bar x).
\]
Moreover, since $f$ is differentiable, $\partial f(\bar x)=\{\nabla f(\bar x)\}$ and hence
\[
\partial F(\bar x)=\nabla f(\bar x)+\partial g(\bar x),
\]
so $\bar x$ is a minimizer if and only if there exists $z\in\partial g(\bar x)$ such that
\begin{equation}\label{eq:KKT_general}
\nabla f(\bar x)+z=0.
\end{equation}
\end{lemma}

\begin{proof}
For any proper closed convex function $h$, a point $\bar x$ minimizes $h$ if and only if $0\in\partial h(\bar x)$.
Apply this to $h=F$.

Next, since $f$ is convex and differentiable, $\partial f(\bar x)=\{\nabla f(\bar x)\}$.
A standard subgradient sum rule gives $\partial (f+g)(\bar x)=\partial f(\bar x)+\partial g(\bar x)$,
hence $\partial F(\bar x)=\nabla f(\bar x)+\partial g(\bar x)$.
Therefore $0\in\partial F(\bar x)$ is equivalent to the existence of $z\in\partial g(\bar x)$
satisfying \eqref{eq:KKT_general}.
\end{proof}

\subsection{A cluster (support) certificate and a margin lower bound}

Let $C\subseteq\{1,\dots,n\}$ be a candidate ``cluster'' and let $\bar C:=\{1,\dots,n\}\setminus C$.
For vectors we use the block notation $x=(x_C,x_{\bar C})$.

\begin{theorem}[Primal--dual cluster certificate $\Rightarrow$ support containment and margin]\label{thm:cluster_certificate}
Consider \eqref{eq:comp_model_cert} with convex differentiable $f$ and weights $\lambda_i>0$.
Assume in addition that $F=f+g$ is $\mu$-strongly convex for some $\mu>0$, so the minimizer $x^\star$ is unique.

Fix a set $C\subseteq\{1,\dots,n\}$ and suppose there exist:
\begin{itemize}
\item a vector $\hat x\in\mathbb{R}^n$ with \emph{exact} support $\,\supp(\hat x)=C$
(i.e.\ $\hat x_C$ has no zero entries and $\hat x_{\bar C}=0$), and
\item a parameter $\eta\in(0,1)$,
\end{itemize}
such that the following two conditions hold:
\begin{enumerate}
\item[\textbf{(i)}] \textbf{Stationarity on $C$.}
For all $i\in C$,
\begin{equation}\label{eq:stationarity_C}
\nabla f(\hat x)_i + \lambda_i\,\mathrm{sign}(\hat x_i)=0.
\end{equation}
\item[\textbf{(ii)}] \textbf{Strict dual feasibility on $\bar C$.}
For all $i\in \bar C$,
\begin{equation}\label{eq:strict_dual_feas}
|\nabla f(\hat x)_i| \le (1-\eta)\lambda_i.
\end{equation}
\end{enumerate}
Then:
\begin{enumerate}
\item[(a)] $\hat x$ is the unique global minimizer: $\hat x=x^\star$. Hence $\supp(x^\star)=C$.
\item[(b)] The strict-complementarity margin satisfies
\begin{equation}\label{eq:gamma_lower_bound_cert}
\gamma := \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr)
\;\ge\; \eta\,\min_{i\in \bar C}\lambda_i.
\end{equation}
\end{enumerate}
\end{theorem}

\begin{proof}
\paragraph{Step 1: Construct an explicit subgradient $z\in\partial g(\hat x)$.}
Define $z\in\mathbb{R}^n$ coordinatewise by
\[
z_i :=
\begin{cases}
\lambda_i\,\mathrm{sign}(\hat x_i), & i\in C,\\[1mm]
-\nabla f(\hat x)_i, & i\in\bar C.
\end{cases}
\]
We claim $z\in\partial g(\hat x)$.

Indeed, since $\supp(\hat x)=C$, we have $\hat x_i\neq 0$ for all $i\in C$ and $\hat x_i=0$ for all $i\in\bar C$.
By Lemma~\ref{lem:subdiff_l1_weighted}:
\begin{itemize}
\item For $i\in C$ (nonzero coordinates), the subdifferential requires $z_i=\lambda_i\mathrm{sign}(\hat x_i)$,
which holds by definition.
\item For $i\in\bar C$ (zero coordinates), the subdifferential requires $z_i\in[-\lambda_i,\lambda_i]$.
But by \eqref{eq:strict_dual_feas} and our definition, $|z_i|=|\nabla f(\hat x)_i|\le (1-\eta)\lambda_i<\lambda_i$,
so indeed $z_i\in[-\lambda_i,\lambda_i]$.
\end{itemize}
Thus $z\in\partial g(\hat x)$.

\paragraph{Step 2: Verify the KKT condition $\nabla f(\hat x)+z=0$.}
We check this coordinatewise.

\begin{itemize}
\item For $i\in C$, $z_i=\lambda_i\mathrm{sign}(\hat x_i)$ and \eqref{eq:stationarity_C} gives
$\nabla f(\hat x)_i+z_i=0$.
\item For $i\in\bar C$, we defined $z_i=-\nabla f(\hat x)_i$, hence $\nabla f(\hat x)_i+z_i=0$ holds trivially.
\end{itemize}
Therefore $\nabla f(\hat x)+z=0$ with $z\in\partial g(\hat x)$.

\paragraph{Step 3: Conclude $\hat x$ is a global minimizer.}
By Lemma~\ref{lem:optimality_convex}, the existence of $z\in\partial g(\hat x)$ such that
$\nabla f(\hat x)+z=0$ implies $0\in\partial F(\hat x)$ and hence $\hat x$ minimizes $F$.

\paragraph{Step 4: Conclude uniqueness and identify the support.}
By assumption, $F$ is $\mu$-strongly convex with $\mu>0$, hence the minimizer is unique.
Since $\hat x$ is a minimizer, it must equal the unique minimizer: $\hat x=x^\star$.
In particular, $\supp(x^\star)=\supp(\hat x)=C$ and thus $I^\star=\bar C$.

\paragraph{Step 5: Lower bound the margin.}
Since $I^\star=\bar C$, we can write
\[
\gamma = \min_{i\in\bar C}\left(\lambda_i-|\nabla f(x^\star)_i|\right)
= \min_{i\in\bar C}\left(\lambda_i-|\nabla f(\hat x)_i|\right),
\]
where the equality uses $x^\star=\hat x$.
Using \eqref{eq:strict_dual_feas}, for each $i\in\bar C$ we have
\[
\lambda_i-|\nabla f(\hat x)_i|
\ge \lambda_i-(1-\eta)\lambda_i = \eta\lambda_i.
\]
Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_lower_bound_cert}.
\end{proof}

\subsection{Specialization to $\ell_1$-regularized PageRank and a $\sqrt{\rho}$ condition}

We now specialize Theorem~\ref{thm:cluster_certificate} to the $\ell_1$-regularized PageRank objective~(3):
\[
F(x)= f(x)+g(x),\qquad
f(x)=\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s,\qquad
g(x)=\rho\alpha\|D^{1/2}x\|_1.
\]
In this case $g(x)=\sum_{i=1}^n \lambda_i|x_i|$ with weights
\begin{equation}\label{eq:lambda_ppr}
\lambda_i=\rho\alpha\sqrt{d_i}.
\end{equation}

\begin{corollary}[$\gamma=\Omega(\sqrt{\rho})$ under a degree condition]\label{cor:gamma_sqrt_rho}
Assume the PageRank objective~(3) and suppose the hypotheses of Theorem~\ref{thm:cluster_certificate}
hold for some set $C$ and slack $\eta\in(0,1)$.
Then
\[
\gamma \ge \eta\,\min_{i\in\bar C}\lambda_i = \eta\,\rho\alpha\,\min_{i\in\bar C}\sqrt{d_i}.
\]
In particular, if the complement degrees satisfy
\begin{equation}\label{eq:degree_condition_out}
\min_{i\in\bar C} d_i \;\ge\; \frac{c_0}{\alpha^2\rho}
\qquad\text{for some constant }c_0>0,
\end{equation}
then
\begin{equation}\label{eq:gamma_Omega_sqrt_rho}
\gamma \;\ge\; \eta\,\sqrt{c_0}\,\sqrt{\rho}.
\end{equation}
\end{corollary}

\begin{proof}
The first inequality follows immediately by substituting \eqref{eq:lambda_ppr} into
\eqref{eq:gamma_lower_bound_cert}.

If additionally \eqref{eq:degree_condition_out} holds, then for all $i\in\bar C$,
\[
\lambda_i=\rho\alpha\sqrt{d_i}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}.
\]
Taking the minimum over $i\in\bar C$ and multiplying by $\eta$ yields \eqref{eq:gamma_Omega_sqrt_rho}.
\end{proof}

\begin{remark}[Relation to conductance and internal connectivity]
Theorem~\ref{thm:cluster_certificate} isolates the core requirement:
a \emph{strict dual feasibility} inequality \eqref{eq:strict_dual_feas} on $\bar C$.
In the PageRank setting, $\nabla f(\hat x)=Q\hat x-\alpha D^{-1/2}s$ and the block $Q_{\bar C C}$
only involves edges crossing the cut $(C,\bar C)$.
Thus, when $C$ has small boundary (low conductance) and is internally well-conditioned (so the restricted solution
on $C$ does not become too large), one expects \eqref{eq:strict_dual_feas} to hold with a nontrivial slack $\eta$.
The degree condition \eqref{eq:degree_condition_out} is a simple sufficient way to ensure
$\min_{i\in\bar C}\lambda_i=\Theta(\sqrt{\rho})$, which is necessary for $\gamma$ to be of order $\sqrt{\rho}$.
\end{remark}
% ============================================================

User: fix the bugs "This is pdfTeX, Version 3.141592653-2.6-1.40.27 TeX Live 2025 preloaded formatpdflatex 2025.10.27 15 DE…[edit | edit source]

fix the bugs "This is pdfTeX, Version 3.141592653-2.6-1.40.27 (TeX Live 2025) (preloaded format=pdflatex 2025.10.27) 15 DEC 2025 16:06 entering extended mode

\write18 enabled.
%&-line parsing enabled.
    • 00_main.tex

(./00_main.tex LaTeX2e <2025-06-01> patch level 1 L3 programming layer <2025-05-26> (/usr/local/texlive/2025/texmf-dist/tex/latex/base/article.cls Document Class: article 2025/01/22 v1.4n Standard LaTeX document class (/usr/local/texlive/2025/texmf-dist/tex/latex/base/size11.clo File: size11.clo 2025/01/22 v1.4n Standard LaTeX file (size option) ) \c@part=\count275 \c@section=\count276 \c@subsection=\count277 \c@subsubsection=\count278 \c@paragraph=\count279 \c@subparagraph=\count280 \c@figure=\count281 \c@table=\count282 \abovecaptionskip=\skip49 \belowcaptionskip=\skip50 \bibindent=\dimen148 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsmath.sty Package: amsmath 2025/05/18 v2.17x AMS math features \@mathmargin=\skip51 For additional information on amsmath, use the ?' option. (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amstext.sty Package: amstext 2024/11/17 v2.01 AMS text (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsgen.sty File: amsgen.sty 1999/11/30 v2.0 generic functions \@emptytoks=\toks17 \ex@=\dimen149 )) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsbsy.sty Package: amsbsy 1999/11/29 v1.2d Bold Symbols \pmbraise@=\dimen150 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsmath/amsopn.sty Package: amsopn 2022/04/08 v2.04 operator names ) \inf@bad=\count283 LaTeX Info: Redefining \frac on input line 233. \uproot@=\count284 \leftroot@=\count285 LaTeX Info: Redefining \overline on input line 398. LaTeX Info: Redefining \colon on input line 409. \classnum@=\count286 \DOTSCASE@=\count287 LaTeX Info: Redefining \ldots on input line 495. LaTeX Info: Redefining \dots on input line 498. LaTeX Info: Redefining \cdots on input line 619. \Mathstrutbox@=\box53 \strutbox@=\box54 LaTeX Info: Redefining \big on input line 721. LaTeX Info: Redefining \Big on input line 722. LaTeX Info: Redefining \bigg on input line 723. LaTeX Info: Redefining \Bigg on input line 724. \big@size=\dimen151 LaTeX Font Info: Redeclaring font encoding OML on input line 742. LaTeX Font Info: Redeclaring font encoding OMS on input line 743. \macc@depth=\count288 LaTeX Info: Redefining \bmod on input line 904. LaTeX Info: Redefining \pmod on input line 909. LaTeX Info: Redefining \smash on input line 939. LaTeX Info: Redefining \relbar on input line 969. LaTeX Info: Redefining \Relbar on input line 970. \c@MaxMatrixCols=\count289 \dotsspace@=\muskip17 \c@parentequation=\count290 \dspbrk@lvl=\count291 \tag@help=\toks18 \row@=\count292 \column@=\count293 \maxfields@=\count294 \andhelp@=\toks19 \eqnshift@=\dimen152 \alignsep@=\dimen153 \tagshift@=\dimen154 \tagwidth@=\dimen155 \totwidth@=\dimen156 \lineht@=\dimen157 \@envbody=\toks20 \multlinegap=\skip52 \multlinetaggap=\skip53 \mathdisplay@stack=\toks21 LaTeX Info: Redefining \[ on input line 2949. LaTeX Info: Redefining \] on input line 2950. ) (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/amssymb.sty Package: amssymb 2013/01/14 v3.01 AMS font symbols (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/amsfonts.sty Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support \symAMSa=\mathgroup4 \symAMSb=\mathgroup5 LaTeX Font Info: Redeclaring math symbol \hbar on input line 98. LaTeX Font Info: Overwriting math alphabet \mathfrak' in version bold' (Font) U/euf/m/n --> U/euf/b/n on input line 106. )) (/usr/local/texlive/2025/texmf-dist/tex/latex/amscls/amsthm.sty Package: amsthm 2020/05/29 v2.20.6 \thm@style=\toks22 \thm@bodyfont=\toks23 \thm@headfont=\toks24 \thm@notefont=\toks25 \thm@headpunct=\toks26 \thm@preskip=\skip54 \thm@postskip=\skip55 \thm@headsep=\skip56 \dth@everypar=\toks27 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/mathtools/mathtools.sty Package: mathtools 2024/10/04 v1.31 mathematical typesetting tools (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/keyval.sty Package: keyval 2022/05/29 v1.15 key=value parser (DPC) \KV@toks@=\toks28 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/tools/calc.sty Package: calc 2025/03/01 v4.3b Infix arithmetic (KKT,FJ) \calc@Acount=\count295 \calc@Bcount=\count296 \calc@Adimen=\dimen158 \calc@Bdimen=\dimen159 \calc@Askip=\skip57 \calc@Bskip=\skip58 LaTeX Info: Redefining \setlength on input line 86. LaTeX Info: Redefining \addtolength on input line 87. \calc@Ccount=\count297 \calc@Cskip=\skip59 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/mathtools/mhsetup.sty Package: mhsetup 2021/03/18 v1.4 programming setup (MH) ) \g_MT_multlinerow_int=\count298 \l_MT_multwidth_dim=\dimen160 \origjot=\skip60 \l_MT_shortvdotswithinadjustabove_dim=\dimen161 \l_MT_shortvdotswithinadjustbelow_dim=\dimen162 \l_MT_above_intertext_sep=\dimen163 \l_MT_below_intertext_sep=\dimen164 \l_MT_above_shortintertext_sep=\dimen165 \l_MT_below_shortintertext_sep=\dimen166 \xmathstrut@box=\box55 \xmathstrut@dim=\dimen167 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/geometry/geometry.sty Package: geometry 2020/01/02 v5.9 Page Geometry (/usr/local/texlive/2025/texmf-dist/tex/generic/iftex/ifvtex.sty Package: ifvtex 2019/10/25 v1.7 ifvtex legacy package. Use iftex instead. (/usr/local/texlive/2025/texmf-dist/tex/generic/iftex/iftex.sty Package: iftex 2024/12/12 v1.0g TeX engine tests )) \Gm@cnth=\count299 \Gm@cntv=\count300 \c@Gm@tempcnt=\count301 \Gm@bindingoffset=\dimen168 \Gm@wd@mp=\dimen169 \Gm@odd@mp=\dimen170 \Gm@even@mp=\dimen171 \Gm@layoutwidth=\dimen172 \Gm@layoutheight=\dimen173 \Gm@layouthoffset=\dimen174 \Gm@layoutvoffset=\dimen175 \Gm@dimlist=\toks29 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/hyperref.sty Package: hyperref 2025-05-20 v7.01m Hypertext links for LaTeX (/usr/local/texlive/2025/texmf-dist/tex/latex/kvsetkeys/kvsetkeys.sty Package: kvsetkeys 2022-10-05 v1.19 Key value parser (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/kvdefinekeys/kvdefinekeys.sty Package: kvdefinekeys 2019-12-19 v1.6 Define keys (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/pdfescape/pdfescape.sty Package: pdfescape 2019/12/09 v1.15 Implements pdfTeX's escape features (HO) (/usr/local/texlive/2025/texmf-dist/tex/generic/ltxcmds/ltxcmds.sty Package: ltxcmds 2023-12-04 v1.26 LaTeX kernel commands for general use (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/pdftexcmds/pdftexcmds.sty Package: pdftexcmds 2020-06-27 v0.33 Utility functions of pdfTeX for LuaTeX (HO) (/usr/local/texlive/2025/texmf-dist/tex/generic/infwarerr/infwarerr.sty Package: infwarerr 2019/12/03 v1.5 Providing info/warning/error messages (HO) ) Package pdftexcmds Info: \pdf@primitive is available. Package pdftexcmds Info: \pdf@ifprimitive is available. Package pdftexcmds Info: \pdfdraftmode found. )) (/usr/local/texlive/2025/texmf-dist/tex/latex/hycolor/hycolor.sty Package: hycolor 2020-01-27 v1.10 Color options for hyperref/bookmark (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/nameref.sty Package: nameref 2023-11-26 v2.56 Cross-referencing by name of section (/usr/local/texlive/2025/texmf-dist/tex/latex/refcount/refcount.sty Package: refcount 2019/12/15 v3.6 Data extraction from label references (HO) ) (/usr/local/texlive/2025/texmf-dist/tex/generic/gettitlestring/gettitlestring.sty Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO) (/usr/local/texlive/2025/texmf-dist/tex/latex/kvoptions/kvoptions.sty Package: kvoptions 2022-06-15 v3.15 Key value format for package options (HO) )) \c@section@level=\count302 ) (/usr/local/texlive/2025/texmf-dist/tex/latex/etoolbox/etoolbox.sty Package: etoolbox 2025/02/11 v2.5l e-TeX tools for LaTeX (JAW) \etb@tempcnta=\count303 ) (/usr/local/texlive/2025/texmf-dist/tex/generic/stringenc/stringenc.sty Package: stringenc 2019/11/29 v1.12 Convert strings between diff. encodings (HO) ) \@linkdim=\dimen176 \Hy@linkcounter=\count304 \Hy@pagecounter=\count305 (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/pd1enc.def File: pd1enc.def 2025-05-20 v7.01m Hyperref: PDFDocEncoding definition (HO) Now handling font encoding PD1 ... ... no UTF-8 mapping file for font encoding PD1 ) (/usr/local/texlive/2025/texmf-dist/tex/generic/intcalc/intcalc.sty Package: intcalc 2019/12/15 v1.3 Expandable calculations with integers (HO) ) \Hy@SavedSpaceFactor=\count306 (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/puenc.def File: puenc.def 2025-05-20 v7.01m Hyperref: PDF Unicode definition (HO) Now handling font encoding PU ... ... no UTF-8 mapping file for font encoding PU ) Package hyperref Info: Hyper figures OFF on input line 4157. Package hyperref Info: Link nesting OFF on input line 4162. Package hyperref Info: Hyper index ON on input line 4165. Package hyperref Info: Plain pages OFF on input line 4172. Package hyperref Info: Backreferencing OFF on input line 4177. Package hyperref Info: Implicit mode ON; LaTeX internals redefined. Package hyperref Info: Bookmarks ON on input line 4424. \c@Hy@tempcnt=\count307 (/usr/local/texlive/2025/texmf-dist/tex/latex/url/url.sty \Urlmuskip=\muskip18 Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc. ) LaTeX Info: Redefining \url on input line 4763. \XeTeXLinkMargin=\dimen177 (/usr/local/texlive/2025/texmf-dist/tex/generic/bitset/bitset.sty Package: bitset 2019/12/09 v1.3 Handle bit-vector datatype (HO) (/usr/local/texlive/2025/texmf-dist/tex/generic/bigintcalc/bigintcalc.sty Package: bigintcalc 2019/12/15 v1.5 Expandable calculations on big integers (HO) )) \Fld@menulength=\count308 \Field@Width=\dimen178 \Fld@charsize=\dimen179 Package hyperref Info: Hyper figures OFF on input line 6042. Package hyperref Info: Link nesting OFF on input line 6047. Package hyperref Info: Hyper index ON on input line 6050. Package hyperref Info: backreferencing OFF on input line 6057. Package hyperref Info: Link coloring OFF on input line 6062. Package hyperref Info: Link coloring with OCG OFF on input line 6067. Package hyperref Info: PDF/A mode OFF on input line 6072. (/usr/local/texlive/2025/texmf-dist/tex/latex/base/atbegshi-ltx.sty Package: atbegshi-ltx 2021/01/10 v1.0c Emulation of the original atbegshi package with kernel methods ) \Hy@abspage=\count309 \c@Item=\count310 \c@Hfootnote=\count311 ) Package hyperref Info: Driver (autodetected): hpdftex. (/usr/local/texlive/2025/texmf-dist/tex/latex/hyperref/hpdftex.def File: hpdftex.def 2025-05-20 v7.01m Hyperref driver for pdfTeX \Fld@listcount=\count312 \c@bookmark@seq@number=\count313 (/usr/local/texlive/2025/texmf-dist/tex/latex/rerunfilecheck/rerunfilecheck.sty Package: rerunfilecheck 2022-07-10 v1.10 Rerun checks for auxiliary files (HO) (/usr/local/texlive/2025/texmf-dist/tex/latex/base/atveryend-ltx.sty Package: atveryend-ltx 2020/08/19 v1.0a Emulation of the original atveryend package with kernel methods ) (/usr/local/texlive/2025/texmf-dist/tex/generic/uniquecounter/uniquecounter.sty Package: uniquecounter 2019/12/15 v1.4 Provide unlimited unique counter (HO) ) Package uniquecounter Info: New unique counter rerunfilecheck' on input line 285. ) \Hy@SectionHShift=\skip61 ) \c@theorem=\count314 \c@lemma=\count315 \c@corollary=\count316 \c@assumption=\count317 \c@remark=\count318 (/usr/local/texlive/2025/texmf-dist/tex/latex/l3backend/l3backend-pdftex.def File: l3backend-pdftex.def 2025-04-14 L3 backend support: PDF output (pdfTeX) \l__color_backend_stack_int=\count319 ) (./output.aux) \openout1 = output.aux'.

LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 31. LaTeX Font Info: ... okay on input line 31. (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/graphicx.sty Package: graphicx 2024/12/31 v1.2e Enhanced LaTeX Graphics (DPC,SPQR) (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/graphics.sty Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR) (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics/trig.sty Package: trig 2023/12/02 v1.11 sin cos tan (DPC) ) (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics-cfg/graphics.cfg File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration ) Package graphics Info: Driver file: pdftex.def on input line 106. (/usr/local/texlive/2025/texmf-dist/tex/latex/graphics-def/pdftex.def File: pdftex.def 2024/04/13 v1.2c Graphics/color driver for pdftex (/usr/local/texlive/2025/texmf-dist/tex/context/base/mkii/supp-pdf.mkii [Loading MPS to PDF converter (version 2006.09.02).] \scratchcounter=\count320 \scratchdimen=\dimen180 \scratchbox=\box56 \nofMPsegments=\count321 \nofMParguments=\count322 \everyMPshowfont=\toks30 \MPscratchCnt=\count323 \MPscratchDim=\dimen181 \MPnumerator=\count324 \makeMPintoPDFobject=\count325 \everyMPtoPDFconversion=\toks31 ))) (/usr/local/texlive/2025/texmf-dist/tex/latex/epstopdf-pkg/epstopdf-base.sty Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf Package epstopdf-base Info: Redefining graphics rule for .eps' on input line 485. (/usr/local/texlive/2025/texmf-dist/tex/latex/latexconfig/epstopdf-sys.cfg File: epstopdf-sys.cfg 2010/07/13 v1.3 Configuration of (r)epstopdf for TeX Live )) \Gin@req@height=\dimen182 \Gin@req@width=\dimen183 ) geometry driver: auto-detecting geometry detected driver: pdftex geometry verbose mode - [ preamble ] result:

  • driver: pdftex
  • paper: <default>
  • layout: <same size as paper>
  • layoutoffset:(h,v)=(0.0pt,0.0pt)
  • modes:
  • h-part:(L,W,R)=(72.26999pt, 469.75502pt, 72.26999pt)
  • v-part:(T,H,B)=(72.26999pt, 650.43001pt, 72.26999pt)
  • \paperwidth=614.295pt
  • \paperheight=794.96999pt
  • \textwidth=469.75502pt
  • \textheight=650.43001pt
  • \oddsidemargin=0.0pt
  • \evensidemargin=0.0pt
  • \topmargin=-37.0pt
  • \headheight=12.0pt
  • \headsep=25.0pt
  • \topskip=11.0pt
  • \footskip=30.0pt
  • \marginparwidth=59.0pt
  • \marginparsep=10.0pt
  • \columnsep=10.0pt
  • \skip\footins=10.0pt plus 4.0pt minus 2.0pt
  • \hoffset=0.0pt
  • \voffset=0.0pt
  • \mag=1000
  • \@twocolumnfalse
  • \@twosidefalse
  • \@mparswitchfalse
  • \@reversemarginfalse
  • (1in=72.27pt=25.4mm, 1cm=28.453pt)

Package hyperref Info: Link coloring OFF on input line 31. (./output.out) (./output.out) \@outlinefile=\write3 \openout3 = output.out'.

LaTeX Font Info: Trying to load font information for U+msa on input line 33. (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/umsa.fd File: umsa.fd 2013/01/14 v3.01 AMS symbols A ) LaTeX Font Info: Trying to load font information for U+msb on input line 33. (/usr/local/texlive/2025/texmf-dist/tex/latex/amsfonts/umsb.fd File: umsb.fd 2013/01/14 v3.01 AMS symbols B )

[1

{/usr/local/texlive/2025/texmf-var/fonts/map/pdftex/updmap/pdftex.map}]

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 119.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 119.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \star' on input line 119.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 119.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 173.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \|' on input line 173.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 173.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 173.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \star' on input line 173.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \|' on input line 173.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 173.

[2]

[3]

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \sum@' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \slimits@' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \new@ifnextchar' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 257.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 319.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \ell' on input line 319.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 319.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 319.

[4]

[5]

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 449.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \varepsilon' on input line 449.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 449.

[6]

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 498.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@ifnextchar' on input line 498.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \rho' on input line 498.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 498.

[7]

[8]

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 643.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 643.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing superscript' on input line 643.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 643.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \hbox' on input line 658.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \mathsurround' on input line 658.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \z@' on input line 658.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \unskip' on input line 658.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@@italiccorr' on input line 658.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 658.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 658.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 658.

[9]

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 685.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \gamma' on input line 685.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 685.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 734.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@ifnextchar' on input line 734.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \rho' on input line 734.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 734.

[10]

[11]

! LaTeX Error: Something's wrong--perhaps a missing \item.

See the LaTeX manual or LaTeX Companion for explanation. Type H <return> for immediate help.

...                                              
                                                 

l.863 \item F

            or $i\in C$ (nonzero coordinates), the subdifferential requires...

Try typing <return> to proceed. If that doesn't work, type X <return> to quit.

[12{/usr/local/texlive/2025/texmf-dist/fonts/enc/dvips/cm-super/cm-super-ts1.enc}]

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \ell' on input line 905.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing subscript' on input line 905.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \@ifnextchar' on input line 905.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing \rho' on input line 905.

Package hyperref Warning: Token not allowed in a PDF string (Unicode): (hyperref) removing math shift' on input line 905.

[13] (./output.aux)

*********

LaTeX2e <2025-06-01> patch level 1 L3 programming layer <2025-05-26>

*********

Package rerunfilecheck Info: File output.out' has not changed. (rerunfilecheck) Checksum: 126B49B4E6E9C0A9E988DE6F81B556FB;7041.

) 

Here is how much of TeX's memory you used:

11138 strings out of 467886
170530 string characters out of 5434056
594304 words of memory out of 5000000
39458 multiletter control sequences out of 15000+600000
640243 words of font info for 90 fonts, out of 8000000 for 9000
1141 hyphenation exceptions out of 8191
75i,12n,79p,544b,667s stack positions out of 10000i,1000n,20000p,200000b,200000s

</usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx12.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmbxti10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmex10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi12.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi6.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi7.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi8.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr17.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr6.pfb></usr/local/te xlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmr8.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy7.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy8.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/cm/cmti10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/amsfonts/symbols/msbm10.pfb></usr/local/texlive/2025/texmf-dist/fonts/type1/public/cm-super/sfrm1095.pfb> Output written on output.pdf (13 pages, 316958 bytes). PDF statistics:

422 PDF objects out of 1000 (max. 8388607)
361 compressed objects within 4 object streams
104 named destinations out of 1000 (max. 500000)
185 words of extra memory for PDF output out of 10000 (max. 10000000)"

Assistant: That \item error is coming from the \paragraph{...} commands inside your amsthm proof environment.[edit | edit source]

That \item error is coming from the \paragraph{...} commands inside your amsthm proof environment.

amsthm’s proof is implemented as a trivlist (i.e., a list-like environment). Using sectioning commands like \paragraph{...} inside it can break the internal list state, so the next \item (from your itemize) is seen as “outside a list,” producing:

Replace the \paragraph{Step ...} lines inside the proof with non-sectioning formatting like \medskip\noindent\textbf{...}.[edit | edit source]

Copy-paste this replacement proof block (it’s the same proof, just without \paragraph):

\begin{proof}
\medskip\noindent\textbf{Step 1 (Construct a subgradient $z\in\partial g(\hat x)$).}
Define $z\in\mathbb{R}^n$ coordinatewise by
\[
z_i :=
\begin{cases}
\lambda_i\,\mathrm{sign}(\hat x_i), & i\in C,\\[1mm]
-\nabla f(\hat x)_i, & i\in\bar C.
\end{cases}
\]
We claim $z\in\partial g(\hat x)$.

Indeed, since $\supp(\hat x)=C$, we have $\hat x_i\neq 0$ for all $i\in C$ and $\hat x_i=0$ for all $i\in\bar C$.
By Lemma~\ref{lem:subdiff_l1_weighted}:
\begin{itemize}
\item For $i\in C$ (nonzero coordinates), the subdifferential requires
$z_i=\lambda_i\,\mathrm{sign}(\hat x_i)$, which holds by definition.
\item For $i\in\bar C$ (zero coordinates), the subdifferential requires $z_i\in[-\lambda_i,\lambda_i]$.
But by \eqref{eq:strict_dual_feas} and our definition,
$|z_i|=|\nabla f(\hat x)_i|\le (1-\eta)\lambda_i<\lambda_i$,
so indeed $z_i\in[-\lambda_i,\lambda_i]$.
\end{itemize}
Thus $z\in\partial g(\hat x)$.

\medskip\noindent\textbf{Step 2 (Verify the KKT condition $\nabla f(\hat x)+z=0$).}
We check this coordinatewise.
\begin{itemize}
\item For $i\in C$, $z_i=\lambda_i\,\mathrm{sign}(\hat x_i)$ and \eqref{eq:stationarity_C} gives
$\nabla f(\hat x)_i+z_i=0$.
\item For $i\in\bar C$, we defined $z_i=-\nabla f(\hat x)_i$, hence
$\nabla f(\hat x)_i+z_i=0$ holds trivially.
\end{itemize}
Therefore $\nabla f(\hat x)+z=0$ with $z\in\partial g(\hat x)$.

\medskip\noindent\textbf{Step 3 (Conclude $\hat x$ is a global minimizer).}
By Lemma~\ref{lem:optimality_convex}, the existence of $z\in\partial g(\hat x)$ such that
$\nabla f(\hat x)+z=0$ implies $0\in\partial F(\hat x)$ and hence $\hat x$ minimizes $F$.

\medskip\noindent\textbf{Step 4 (Uniqueness and identification of the support).}
By assumption, $F$ is $\mu$-strongly convex with $\mu>0$, hence the minimizer is unique.
Since $\hat x$ is a minimizer, it must equal the unique minimizer: $\hat x=x^\star$.
In particular, $\supp(x^\star)=\supp(\hat x)=C$ and thus $I^\star=\bar C$.

\medskip\noindent\textbf{Step 5 (Lower bound the margin).}
Since $I^\star=\bar C$, we can write
\[
\gamma = \min_{i\in\bar C}\left(\lambda_i-|\nabla f(x^\star)_i|\right)
= \min_{i\in\bar C}\left(\lambda_i-|\nabla f(\hat x)_i|\right),
\]
where the equality uses $x^\star=\hat x$.
Using \eqref{eq:strict_dual_feas}, for each $i\in\bar C$ we have
\[
\lambda_i-|\nabla f(\hat x)_i|
\ge \lambda_i-(1-\eta)\lambda_i = \eta\lambda_i.
\]
Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_lower_bound_cert}.
\end{proof}
Those “Token not allowed in a PDF string” warnings come from math in section titles. They’re harmless, but if you want to remove them, use \texorpdfstring in titles, e.g.[edit | edit source]
\section{A primal--dual cluster certificate implying a \texorpdfstring{$\sqrt{\rho}$}{sqrt(rho)}-scale margin}

That will keep the math in the printed title while giving hyperref a plain-text bookmark.

User: I don't like your proof.[edit | edit source]

I don't like your proof. I want a proof that parameterizes the margin γ as a function of conductance, internal connectivity and outside degree. Then if all these parameters are well-behaved, it is guaranteed that γ won't be small, i.e., of the order of sqrt(ρ).

Assistant: Append the following replacement section it gives an explicit parameterized lower bound on the margin γ\gammaγ in terms…[edit | edit source]

Append the following replacement section (it gives an explicit parameterized lower bound on the margin γ\gammaγ in terms of conductance, internal connectivity, and outside degree, and then a corollary showing when this becomes Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ​)). open-problem-fountoulakis22a

% ============================================================
\section{A conductance--connectivity--degree lower bound on the margin}
\label{sec:gamma_conductance_connectivity_degree}

We specialize to the $\ell_1$-regularized PageRank objective~(3):
\begin{equation}\label{eq:ppr_obj_gamma_param}
\min_{x\in\mathbb{R}^n}
F(x)
:=
\underbrace{\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s}_{=:f(x)}
\;+\;
\underbrace{\rho\alpha\|D^{1/2}x\|_1}_{=:g(x)}.
\end{equation}
Here $G=(V,E)$ is undirected and unweighted, $A$ is the adjacency matrix, $D$ is the diagonal degree matrix,
$s=e_v$ is a seed at node $v$, and
\begin{equation}\label{eq:Q_def_gamma_param}
Q=\alpha I+\frac{1-\alpha}{2}L
=
\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}\,D^{-1/2}AD^{-1/2}.
\end{equation}
The weighted $\ell_1$ penalty has coordinate weights
\begin{equation}\label{eq:lambda_def_gamma_param}
g(x)=\sum_{i=1}^n \lambda_i |x_i|,
\qquad
\lambda_i:=\rho\alpha\sqrt{d_i}.
\end{equation}

\paragraph{Cluster notation.}
Fix a set (candidate cluster) $C\subseteq V$ with $v\in C$, and let $\bar C:=V\setminus C$.
Define the volume and cut:
\[
\mathrm{vol}(C):=\sum_{i\in C} d_i,
\qquad
\mathrm{cut}(C,\bar C):=|\{(i,j)\in E:\; i\in C,\ j\in\bar C\}|.
\]
We use the (one-sided) conductance
\[
\phi(C):=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)}.
\]
Let
\[
d_{\mathrm{in}}:=\min_{j\in C} d_j,
\qquad
d_{\mathrm{out}}:=\min_{i\in\bar C} d_i,
\qquad
d_v:=d_{v}.
\]

\paragraph{Internal connectivity parameter.}
Let
\begin{equation}\label{eq:muC_def_gamma_param}
\mu_C:=\lambda_{\min}(Q_{CC}),
\end{equation}
the smallest eigenvalue of the principal submatrix $Q_{CC}$.
Since $Q\succeq \alpha I$, we always have $\mu_C\ge \alpha$, and larger $\mu_C$ corresponds to
stronger internal connectivity/conditioning on the cluster.

\subsection{A pointwise bound on the outside gradient in terms of $\phi(C)$ and $\mu_C$}

We will construct a candidate vector supported on $C$ by solving the KKT stationarity equations \emph{restricted to $C$}
under the natural sign choice $x_C>0$ (appropriate for PageRank-type solutions).

Define the restricted candidate $\hat x$ by
\begin{equation}\label{eq:xhat_def_gamma_param}
\hat x_{\bar C}:=0,
\qquad
\hat x_C:=Q_{CC}^{-1}\bigl(b_C-\lambda_C\bigr),
\qquad
b:=\alpha D^{-1/2}s.
\end{equation}
Since $s=e_v$ and $v\in C$, the vector $b_C$ has exactly one nonzero entry:
\begin{equation}\label{eq:bC_norm_gamma_param}
\|b_C\|_2=\frac{\alpha}{\sqrt{d_v}}.
\end{equation}

\begin{lemma}[Stationarity on $C$]\label{lem:restricted_stationarity_gamma_param}
The vector $\hat x$ satisfies the exact stationarity equation on $C$:
\begin{equation}\label{eq:restricted_stationarity_gamma_param}
\nabla f(\hat x)_C+\lambda_C = 0.
\end{equation}
\end{lemma}

\begin{proof}
Since $\nabla f(x)=Qx-b$, and $\hat x_{\bar C}=0$, we have
\[
\nabla f(\hat x)_C
=
(Q_{CC}\hat x_C - b_C).
\]
By the definition \eqref{eq:xhat_def_gamma_param}, $Q_{CC}\hat x_C=b_C-\lambda_C$, hence
$\nabla f(\hat x)_C = -\lambda_C$, which is \eqref{eq:restricted_stationarity_gamma_param}.
\end{proof}

Next we bound the gradient on $\bar C$ in terms of conductance, internal connectivity, and outside degree.

\begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xC_norm_gamma_param}
Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then
\begin{equation}\label{eq:xC_norm_gamma_param}
\|\hat x_C\|_2
\;\le\;
\frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right).
\end{equation}
\end{lemma}

\begin{proof}
Since $Q_{CC}$ is symmetric positive definite and $\lambda_{\min}(Q_{CC})=\mu_C$,
\[
\|Q_{CC}^{-1}\|_2 = \frac{1}{\mu_C}.
\]
Using \eqref{eq:xhat_def_gamma_param} and the triangle inequality,
\[
\|\hat x_C\|_2
=
\left\|Q_{CC}^{-1}(b_C-\lambda_C)\right\|_2
\le
\|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|\lambda_C\|_2\bigr).
\]
By \eqref{eq:bC_norm_gamma_param}, $\|b_C\|_2=\alpha/\sqrt{d_v}$.
Also, by \eqref{eq:lambda_def_gamma_param},
\[
\|\lambda_C\|_2^2=\sum_{i\in C} (\rho\alpha\sqrt{d_i})^2
=(\rho\alpha)^2\sum_{i\in C} d_i
=(\rho\alpha)^2\,\mathrm{vol}(C),
\]
so $\|\lambda_C\|_2=\rho\alpha\sqrt{\mathrm{vol}(C)}$.
Substitute these bounds to obtain \eqref{eq:xC_norm_gamma_param}.
\end{proof}

\begin{lemma}[Outside gradient bound from conductance and degrees]\label{lem:outside_grad_bound_gamma_param}
Assume $v\in C$ so that $b_{\bar C}=0$. Then for each $i\in\bar C$,
\begin{equation}\label{eq:outside_grad_pointwise_gamma_param}
|\nabla f(\hat x)_i|
\le
\frac{1-\alpha}{2}\,\|\hat x_C\|_2\,
\sqrt{\frac{\mathrm{cut}(C,\bar C)}{d_i\,d_{\mathrm{in}}}}.
\end{equation}
Consequently, using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$,
\begin{equation}\label{eq:outside_grad_uniform_gamma_param}
\max_{i\in\bar C}|\nabla f(\hat x)_i|
\le
\frac{1-\alpha}{2}\,\|\hat x_C\|_2\,
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}.
\end{equation}
\end{lemma}

\begin{proof}
Fix $i\in\bar C$. Since $\hat x_{\bar C}=0$ and $b_{\bar C}=0$,
\[
\nabla f(\hat x)_i = (Q\hat x - b)_i = (Q_{\bar C C}\hat x_C)_i = \sum_{j\in C}Q_{ij}\hat x_j.
\]
From \eqref{eq:Q_def_gamma_param}, for $i\neq j$ we have
\[
Q_{ij} = -\frac{1-\alpha}{2}(D^{-1/2}AD^{-1/2})_{ij}
=
\begin{cases}
-\dfrac{1-\alpha}{2}\dfrac{1}{\sqrt{d_i d_j}}, & (i,j)\in E,\\[2mm]
0, & (i,j)\notin E.
\end{cases}
\]
Thus
\[
|\nabla f(\hat x)_i|
=
\left|\sum_{j\in C\cap N(i)} -\frac{1-\alpha}{2}\frac{\hat x_j}{\sqrt{d_i d_j}}\right|
\le
\frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i}}
\sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}.
\]
Apply Cauchy--Schwarz to the sum:
\[
\sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}
\le
\left(\sum_{j\in C\cap N(i)} \hat x_j^2\right)^{1/2}
\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}
\le
\|\hat x_C\|_2\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}.
\]
Since $d_j\ge d_{\mathrm{in}}$ for all $j\in C$, we have
\[
\sum_{j\in C\cap N(i)} \frac{1}{d_j}
\le
\frac{|C\cap N(i)|}{d_{\mathrm{in}}}.
\]
Finally, $|C\cap N(i)|$ is the number of cut edges incident to $i$, hence $|C\cap N(i)|\le \mathrm{cut}(C,\bar C)$,
which gives \eqref{eq:outside_grad_pointwise_gamma_param}. The uniform bound
\eqref{eq:outside_grad_uniform_gamma_param} follows by using $d_i\ge d_{\mathrm{out}}$
and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$.
\end{proof}

\subsection{A parameterized margin lower bound and a $\sqrt{\rho}$ corollary}

Define the cluster-dependent quantity
\begin{equation}\label{eq:DeltaC_def_gamma_param}
\Delta(C)
:=
\frac{1-\alpha}{2\mu_C}\,
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\,
\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right).
\end{equation}

\begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree]
\label{thm:gamma_parameterized_by_cluster}
Assume $v\in C$ and that the restricted candidate \eqref{eq:xhat_def_gamma_param} satisfies
\begin{equation}\label{eq:xhat_positive_gamma_param}
\hat x_C>0\quad\text{(componentwise)}.
\end{equation}
Then:
\begin{enumerate}
\item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique}
global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$.
\item[(ii)] Whenever $x^\star=\hat x$, the strict-complementarity margin satisfies the explicit lower bound
\begin{equation}\label{eq:gamma_param_bound_gamma_param}
\gamma
=
\min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr)
\;\ge\;
\rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C).
\end{equation}
Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then
\begin{equation}\label{eq:gamma_eta_bound_gamma_param}
\gamma \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}.
\end{equation}
\end{enumerate}
\end{theorem}

\begin{proof}
We prove (i) and (ii) in order.

\medskip\noindent\textbf{Step 1: KKT conditions for \eqref{eq:ppr_obj_gamma_param}.}
Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists
$z\in\partial g(x)$ such that
\begin{equation}\label{eq:KKT_gamma_param}
\nabla f(x)+z=0.
\end{equation}
Moreover, $F$ is strongly convex because $f$ is $\alpha$-strongly convex and $g$ is convex,
so the minimizer is unique.

\medskip\noindent\textbf{Step 2: Build a valid $z\in\partial g(\hat x)$ and verify \eqref{eq:KKT_gamma_param}.}
Because of \eqref{eq:xhat_positive_gamma_param}, $\mathrm{sign}(\hat x_i)=+1$ for all $i\in C$.
Define $z\in\mathbb{R}^n$ by
\[
z_i :=
\begin{cases}
\lambda_i, & i\in C,\\[1mm]
-\nabla f(\hat x)_i, & i\in\bar C.
\end{cases}
\]
We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$.
Indeed:
\begin{itemize}
\item If $i\in C$, then $\hat x_i>0$ and the weighted-$\ell_1$ subgradient requires $z_i=\lambda_i$, which holds.
\item If $i\in\bar C$, then $\hat x_i=0$ and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$.
This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$.
\end{itemize}
Next, Lemma~\ref{lem:restricted_stationarity_gamma_param} gives $\nabla f(\hat x)_C=-\lambda_C$, so
$\nabla f(\hat x)_C+z_C=0$.
By definition, $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$.
Thus \eqref{eq:KKT_gamma_param} holds.

\medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.}
By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param},
\[
\max_{i\in\bar C}|\nabla f(\hat x)_i|
\le
\Delta(C).
\]
Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$,
then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$, so $z\in\partial g(\hat x)$ and the KKT condition holds.
Hence $\hat x$ is a minimizer, and by strong convexity it is the unique minimizer:
$\hat x=x^\star$. This proves (i).

\medskip\noindent\textbf{Step 4: Lower bound the margin.}
Assume $x^\star=\hat x$. Since $x^\star_{\bar C}=0$, the inactive set contains $\bar C$ and
\[
\gamma
=
\min_{i\in \bar C}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr)
\ge
\min_{i\in\bar C}\lambda_i - \max_{i\in\bar C}|\nabla f(\hat x)_i|
=
\rho\alpha\sqrt{d_{\mathrm{out}}} - \max_{i\in\bar C}|\nabla f(\hat x)_i|.
\]
Using $\max_{i\in\bar C}|\nabla f(\hat x)_i|\le \Delta(C)$ gives \eqref{eq:gamma_param_bound_gamma_param}.
The $\eta$-form \eqref{eq:gamma_eta_bound_gamma_param} is immediate.
\end{proof}

\begin{corollary}[A concrete $\gamma=\Omega(\sqrt{\rho})$ regime]\label{cor:gamma_Omega_sqrt_rho_regime}
Assume the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster}.
Suppose, in addition, that there are constants $V_0,d_0,\mu_0,c_0>0$ such that
\[
\mathrm{vol}(C)\le \frac{V_0}{\rho},\qquad
d_{\mathrm{in}}\ge d_0,\qquad
d_v\ge d_0,\qquad
\mu_C\ge \mu_0,\qquad
d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho}.
\]
If furthermore the conductance satisfies
\begin{equation}\label{eq:phi_rho_condition_gamma_param}
\phi(C)\le c_1\rho
\quad\text{for a sufficiently small constant $c_1$ (depending only on $V_0,d_0,\mu_0,c_0,\alpha$),}
\end{equation}
then there exists a constant $\eta\in(0,1)$ (depending only on the same parameters) such that
\[
\gamma \ge \eta\,\sqrt{\rho}.
\]
\end{corollary}

\begin{proof}
From $d_{\mathrm{out}}\ge c_0/(\alpha^2\rho)$ we get
\[
\rho\alpha\sqrt{d_{\mathrm{out}}}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}.
\]
Next, using the assumed bounds in \eqref{eq:DeltaC_def_gamma_param},
\[
\Delta(C)
\le
\frac{1-\alpha}{2\mu_0}\,
\sqrt{\frac{\phi(C)\cdot (V_0/\rho)}{(c_0/(\alpha^2\rho))\cdot d_0}}\,
\left(\frac{\alpha}{\sqrt{d_0}}+\rho\alpha\sqrt{V_0/\rho}\right)
=
\frac{1-\alpha}{2\mu_0}\,
\alpha\,\sqrt{\frac{V_0}{c_0 d_0}}\,
\sqrt{\phi(C)}\,
\left(\frac{\alpha}{\sqrt{d_0}}+\alpha\sqrt{V_0}\sqrt{\rho}\right).
\]
Under \eqref{eq:phi_rho_condition_gamma_param}, $\sqrt{\phi(C)}\le \sqrt{c_1}\sqrt{\rho}$, hence $\Delta(C)\le K\sqrt{\rho}$
for a constant $K$ depending only on $(V_0,d_0,\mu_0,c_0,\alpha)$ and $c_1$.
Choosing $c_1$ sufficiently small ensures $\Delta(C)\le (1-\eta)\sqrt{c_0}\sqrt{\rho}$ for some $\eta\in(0,1)$.
Applying \eqref{eq:gamma_param_bound_gamma_param} then yields $\gamma\ge \eta\sqrt{c_0}\sqrt{\rho}$.
Absorbing $\sqrt{c_0}$ into $\eta$ gives the stated $\gamma\ge \eta\sqrt{\rho}$.
\end{proof}

\begin{remark}[Interpretation]
The quantity $\Delta(C)$ in \eqref{eq:DeltaC_def_gamma_param} is explicit and shows how the margin depends on:
(i) the \emph{outside degree} $d_{\mathrm{out}}$ through the threshold $\lambda_i=\rho\alpha\sqrt{d_i}$,
(ii) the \emph{boundary size} through $\phi(C)$, and
(iii) the \emph{internal conditioning} $\mu_C=\lambda_{\min}(Q_{CC})$.
When $\phi(C)=O(\rho)$, $d_{\mathrm{out}}=\Theta(1/(\alpha^2\rho))$, and $\mu_C$ is bounded below,
the bound forces $\gamma=\Omega(\sqrt{\rho})$.
\end{remark}
% ============================================================

User: Not bad, but there is something missing.[edit | edit source]

Not bad, but there is something missing. How do I know that "\hat x_C>0\quad\text{(componentwise)}." This needs a proof. Does your choice of ρ > φ(C), and other parameter choices that you made, guarantee that ""\hat x_C>0\quad\text{(componentwise)}."" ? If not, can you modify the proof?

Assistant: You’re absolutely right to flag this.[edit | edit source]

You’re absolutely right to flag this. In the parameterized margin section I gave, the line

x^C>0(componentwise)\hat x_C>0 \quad\text{(componentwise)}x^C​>0(componentwise) is not implied by conditions like ρ≳ϕ(C)\rho \gtrsim \phi(C)ρ≳ϕ(C), internal connectivity, or outside degree. In fact, it can fail even in the “best” clusters:

  • Take CCC to be a clique (conductance ϕ(C)=0\phi(C)=0ϕ(C)=0, extremely well connected internally).
  • If ρ\rhoρ is large enough, the ℓ1\ell_1ℓ1​ penalty can still shrink most coordinates in CCC to zero (sometimes leaving only the seed active).

So: no, the “ρ>ϕ(C)\rho > \phi(C)ρ>ϕ(C)”–type assumptions are aimed at controlling leakage / enforcing outside dual feasibility; they do not guarantee that every node in CCC stays active.

You can (and should) modify the proof so that it does not require x^C>0\hat x_C>0x^C​>0. The clean fix is:[edit | edit source]
  1. Define x^\hat xx^ as the restricted minimizer on CCC (with xCˉ=0x_{\bar C}=0xCˉ​=0), not by forcing the sign pattern.
  2. Prove x^C≥0\hat x_C\ge 0x^C​≥0 automatically (nonnegativity is true for PageRank-type QQQ with nonpositive off-diagonals).
  3. Use the restricted KKT subgradient zC∈∂gC(x^C)z_C\in\partial g_C(\hat x_C)zC​∈∂gC​(x^C​) instead of hard-coding zC=λCz_C=\lambda_CzC​=λC​.
  4. Then your conductance/connectivity/degree analysis bounds the outside slack and yields a parameterized lower bound on the outside margin

γCˉ:=min⁡i∈Cˉ(λi−∣∇f(x⋆)i∣).\gamma_{\bar C}:=\min_{i\in\bar C}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr).γCˉ​:=i∈Cˉmin​(λi​−∣∇f(x⋆)i​∣). If you additionally want the global γ\gammaγ (min over all inactive, including possible zeros inside CCC), you must also control any zeros inside CCC; that is a separate “beta-min / interior slack” assumption and cannot be forced just from ϕ(C)\phi(C)ϕ(C) etc.

Below is drop-in LaTeX that fixes the proof properly.

Paste the following into your document, and then replace the definition \eqref{eq:xhat_def_gamma_param} and Theorem~\ref{thm:gamma_parameterized_by_cluster} (and its proof) with the revised versions below.[edit | edit source]

% ------------------------------------------------------------
% Fix: define \hat x as the restricted minimizer (no sign guess),
% prove \hat x_C \ge 0, and use its KKT multiplier z_C.
% ------------------------------------------------------------

\subsubsection*{Restricted minimizer on a candidate cluster}

Define the restricted optimization problem on $C$ by fixing $x_{\bar C}=0$:
\begin{equation}\label{eq:restricted_problem_C}
\min_{x\in\mathbb{R}^n:\; x_{\bar C}=0}\;
F(x)
=
\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s
\;+\;
\sum_{i\in C}\lambda_i |x_i|,
\qquad \lambda_i=\rho\alpha\sqrt{d_i}.
\end{equation}

\begin{definition}[Restricted minimizer]\label{def:xhat_restricted}
Let $\hat x$ denote the unique minimizer of \eqref{eq:restricted_problem_C}.
Equivalently, $\hat x_{\bar C}=0$ and $\hat x_C$ is the unique minimizer of
\[
\min_{x_C\in\mathbb{R}^{|C|}}\;
\frac12 x_C^\top Q_{CC}x_C - b_C^\top x_C + \sum_{i\in C}\lambda_i|x_i|,
\qquad b:=\alpha D^{-1/2}s.
\]
\end{definition}

\begin{lemma}[Absolute-value improvement and nonnegativity]\label{lem:abs_improvement_nonneg}
Assume $v\in C$ so that $b_C\ge 0$ and $b_{\bar C}=0$.
Then:
\begin{enumerate}
\item For every $x$ with $x_{\bar C}=0$, the vector $|x|$ (componentwise absolute value) satisfies
\[
F(|x|)\le F(x).
\]
\item Consequently, the restricted minimizer satisfies $\hat x_C\ge 0$ componentwise.
\end{enumerate}
\end{lemma}

\begin{proof}
Fix any $x$ with $x_{\bar C}=0$ and set $y:=|x|$ (so $y_{\bar C}=0$).
We compare each term in $F$.

\medskip\noindent\textbf{(i) Quadratic term.}
Since $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$, we have $Q_{ij}\le 0$ for $i\neq j$.
Write
\[
x^\top Qx = \sum_i Q_{ii}x_i^2 + 2\sum_{i<j}Q_{ij}x_i x_j.
\]
For each $i<j$, we have $x_i x_j \le |x_i||x_j|=y_i y_j$, and since $Q_{ij}\le 0$ this implies
$Q_{ij}x_i x_j \ge Q_{ij}y_i y_j$.
Therefore $x^\top Qx \ge y^\top Qy$.

\medskip\noindent\textbf{(ii) Linear term.}
Since $b\ge 0$ and $y_i=|x_i|\ge x_i$ for all $i$, we have
$-b^\top y \le -b^\top x$.

\medskip\noindent\textbf{(iii) $\ell_1$ term.}
By definition, $\sum_i \lambda_i |y_i|=\sum_i\lambda_i|x_i|$.

\medskip\noindent
Combining (i)--(iii) yields $F(y)\le F(x)$, proving the first claim.
For the second claim, let $\hat x$ be the unique restricted minimizer.
Then $F(|\hat x|)\le F(\hat x)$, so $|\hat x|$ is also a minimizer.
By uniqueness, $|\hat x|=\hat x$, hence $\hat x_C\ge 0$.
\end{proof}

\begin{lemma}[Restricted KKT on $C$]\label{lem:restricted_KKT_on_C}
There exists a vector $z_C\in\mathbb{R}^{|C|}$ such that
\begin{equation}\label{eq:restricted_KKT}
Q_{CC}\hat x_C - b_C + z_C = 0,
\end{equation}
and for every $i\in C$,
\begin{equation}\label{eq:zC_bound}
z_{C,i}\in \lambda_i\,\partial |\,\hat x_i\,| \subseteq [-\lambda_i,\lambda_i].
\end{equation}
In particular, $|z_{C,i}|\le \lambda_i$ for all $i\in C$, and thus
$\|z_C\|_2 \le \|\lambda_C\|_2 = \rho\alpha\sqrt{\mathrm{vol}(C)}$.
\end{lemma}

\begin{proof}
The restricted objective in \eqref{eq:restricted_problem_C} is strongly convex in $x_C$
because $Q_{CC}\succ 0$ and the $\ell_1$ term is convex, hence it has a unique minimizer $\hat x_C$.
First-order optimality for convex composite problems gives
\[
0 \in Q_{CC}\hat x_C - b_C + \partial\!\left(\sum_{i\in C}\lambda_i|\hat x_i|\right),
\]
which is exactly the existence of $z_C$ satisfying \eqref{eq:restricted_KKT}--\eqref{eq:zC_bound}.
The norm bound follows from $|z_{C,i}|\le\lambda_i$ and the definition of $\|\lambda_C\|_2$.
\end{proof}

\begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xhatC_norm_gamma_param_fixed}
Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then
\[
\|\hat x_C\|_2
\le
\frac{1}{\mu_C}\left(\|b_C\|_2+\|z_C\|_2\right)
\le
\frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right).
\]
\end{lemma}

\begin{proof}
From \eqref{eq:restricted_KKT}, $Q_{CC}\hat x_C=b_C-z_C$, so
\[
\|\hat x_C\|_2 \le \|Q_{CC}^{-1}\|_2\,\|b_C-z_C\|_2
\le \frac{1}{\mu_C}\left(\|b_C\|_2+\|z_C\|_2\right).
\]
Since $s=e_v$ and $v\in C$, we have $\|b_C\|_2=\alpha/\sqrt{d_v}$, and
Lemma~\ref{lem:restricted_KKT_on_C} gives $\|z_C\|_2\le \rho\alpha\sqrt{\mathrm{vol}(C)}$.
\end{proof}

Now here is the revised theorem that replaces the positivity-dependent one. Note how it (correctly) lower bounds an outside margin γCˉ\gamma_{\bar C}γCˉ​. If you still want the global γ\gammaγ, you’ll need an extra interior assumption (I add that as an “optional add-on” at the end).

\begin{theorem}[Parameterized outside-margin bound without assuming $\hat x_C>0$]
\label{thm:gamma_parameterized_by_cluster_fixed}
Assume $v\in C$ and let $\hat x$ be the restricted minimizer from Definition~\ref{def:xhat_restricted}.
Define
\[
\gamma_{\bar C}:=\min_{i\in\bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr),
\qquad \lambda_i=\rho\alpha\sqrt{d_i}.
\]
Let $\Delta(C)$ be defined exactly as in \eqref{eq:DeltaC_def_gamma_param}, but with
$\|\hat x_C\|_2$ bounded using Lemma~\ref{lem:xhatC_norm_gamma_param_fixed}; i.e.
\[
\Delta(C)
:=
\frac{1-\alpha}{2\mu_C}\,
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\,
\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right).
\]
Then:
\begin{enumerate}
\item[(i)] If $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then
$x^\star_{\bar C}=0$ (the global minimizer is supported inside $C$).
\item[(ii)] Under the same condition, the \emph{outside margin} satisfies
\[
\gamma_{\bar C}\ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}.
\]
\end{enumerate}
\end{theorem}

\begin{proof}
\medskip\noindent\textbf{Step 1 (Outside-gradient bound).}
Since $v\in C$, we have $b_{\bar C}=0$.
Because $\hat x_{\bar C}=0$,
\[
\nabla f(\hat x)_{\bar C} = (Q\hat x-b)_{\bar C} = (Q_{\bar C C}\hat x_C)_{\bar C}.
\]
Lemma~\ref{lem:outside_grad_bound_gamma_param} (unchanged) bounds
$\max_{i\in\bar C}|\nabla f(\hat x)_i|\le \Delta(C)$, using
Lemma~\ref{lem:xhatC_norm_gamma_param_fixed} to control $\|\hat x_C\|_2$.

\medskip\noindent\textbf{Step 2 (Build a global KKT certificate).}
From Lemma~\ref{lem:restricted_KKT_on_C} there exists $z_C\in\partial g_C(\hat x_C)$ such that
$Q_{CC}\hat x_C-b_C+z_C=0$.
Define a full vector $z\in\mathbb{R}^n$ by
\[
z_i :=
\begin{cases}
(z_C)_i, & i\in C,\\[1mm]
-\nabla f(\hat x)_i, & i\in\bar C.
\end{cases}
\]
We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$.
Indeed:
\begin{itemize}
\item On $C$, $z_C\in\partial g_C(\hat x_C)$ by construction.
\item On $\bar C$, $\hat x_i=0$, and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$.
This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$.
\end{itemize}
Under the hypothesis $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ we have
$|\nabla f(\hat x)_i|\le \Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$
for all $i\in\bar C$ (since $\lambda_i\ge \rho\alpha\sqrt{d_{\mathrm{out}}}$).
Thus $z\in\partial g(\hat x)$.

Finally, note that for $i\in C$,
\[
(\nabla f(\hat x)+z)_i = (Q_{CC}\hat x_C-b_C+z_C)_i = 0,
\]
and for $i\in\bar C$ we have $(\nabla f(\hat x)+z)_i= \nabla f(\hat x)_i-\nabla f(\hat x)_i=0$.
Hence $\nabla f(\hat x)+z=0$ with $z\in\partial g(\hat x)$, i.e. $0\in\partial F(\hat x)$.
Therefore $\hat x$ is a global minimizer. By strong convexity, it is the unique global minimizer:
$\hat x=x^\star$. In particular $x^\star_{\bar C}=0$, proving (i).

\medskip\noindent\textbf{Step 3 (Outside margin).}
Since $x^\star=\hat x$ and $\bar C\subseteq I^\star$, for every $i\in\bar C$,
\[
\lambda_i-|\nabla f(x^\star)_i|
=\lambda_i-|\nabla f(\hat x)_i|
\ge \lambda_i-\Delta(C)
\ge \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C)
\ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}}.
\]
Taking the minimum over $i\in\bar C$ yields $\gamma_{\bar C}\ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}}$, proving (ii).
\end{proof}

\begin{remark}[Why this does not automatically give the \emph{global} margin $\gamma$]
The global margin is $\gamma=\min_{i\in I^\star}(\lambda_i-|\nabla f(x^\star)_i|)$, where $I^\star$ includes
\emph{all} zero coordinates of $x^\star$, including any zeros inside $C$.
The theorem above lower bounds the margin only on $\bar C$.
To conclude the same lower bound for the global $\gamma$, one needs an additional assumption that
$x^\star_i>0$ for all $i\in C$ (equivalently, $I^\star=\bar C$), or else a separate lower bound on the slack
for the (possibly) inactive indices inside $C$.
Such an ``interior beta-min / interior slack'' condition cannot be forced solely from $\phi(C)$, $\mu_C$, and $d_{\mathrm{out}}$:
even an internally expanding cluster with $\phi(C)=0$ can have many coordinates truncated to $0$ if $\rho$ is chosen large.
\end{remark}
- No, ρ\rhoρ vs. ϕ(C)\phi(C)ϕ(C) does not guarantee x^C>0\hat x_C>0x^C​>0.[edit | edit source]
  • The right fix is to: - define x^\hat xx^ as the restricted minimizer, - prove x^C≥0\hat x_C\ge 0x^C​≥0 (done above), - and then your conductance/connectivity/degree bound gives a clean Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ​) outside margin.

If you tell me whether you truly need the global γ\gammaγ (including possible zeros inside CCC) or whether an outside margin γCˉ\gamma_{\bar C}γCˉ​ is sufficient for your downstream argument (runtime/locality), I can tighten the statement accordingly.

User: I don't understand, can you tell me how exactly to update this piece of code?[edit | edit source]

I don't understand, can you tell me how exactly to update this piece of code? "\section{A conductance--connectivity--degree lower bound on the margin} \label{sec:gamma_conductance_connectivity_degree}

We specialize to the $\ell_1$-regularized PageRank objective~(3): \begin{equation}\label{eq:ppr_obj_gamma_param} \min_{x\in\mathbb{R}^n} F(x)

=

\underbrace{\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s}_{=:f(x)} \;+\; \underbrace{\rho\alpha\|D^{1/2}x\|_1}_{=:g(x)}. \end{equation} Here $G=(V,E)$ is undirected and unweighted, $A$ is the adjacency matrix, $D$ is the diagonal degree matrix, $s=e_v$ is a seed at node $v$, and \begin{equation}\label{eq:Q_def_gamma_param} Q=\alpha I+\frac{1-\alpha}{2}L = \frac{1+\alpha}{2}I-\frac{1-\alpha}{2}\,D^{-1/2}AD^{-1/2}. \end{equation} The weighted $\ell_1$ penalty has coordinate weights \begin{equation}\label{eq:lambda_def_gamma_param} g(x)=\sum_{i=1}^n \lambda_i |x_i|, \qquad \lambda_i:=\rho\alpha\sqrt{d_i}. \end{equation}

\paragraph{Cluster notation.} Fix a set (candidate cluster) $C\subseteq V$ with $v\in C$, and let $\bar C:=V\setminus C$. Define the volume and cut: \[ \mathrm{vol}(C):=\sum_{i\in C} d_i, \qquad \mathrm{cut}(C,\bar C):=|\{(i,j)\in E:\; i\in C,\ j\in\bar C\}|. \] We use the (one-sided) conductance \[ \phi(C):=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)}. \] Let \[ d_{\mathrm{in}}:=\min_{j\in C} d_j, \qquad d_{\mathrm{out}}:=\min_{i\in\bar C} d_i, \qquad d_v:=d_{v}. \]

\paragraph{Internal connectivity parameter.} Let \begin{equation}\label{eq:muC_def_gamma_param} \mu_C:=\lambda_{\min}(Q_{CC}), \end{equation} the smallest eigenvalue of the principal submatrix $Q_{CC}$. Since $Q\succeq \alpha I$, we always have $\mu_C\ge \alpha$, and larger $\mu_C$ corresponds to stronger internal connectivity/conditioning on the cluster.

\subsection{A pointwise bound on the outside gradient in terms of $\phi(C)$ and $\mu_C$}

We will construct a candidate vector supported on $C$ by solving the KKT stationarity equations \emph{restricted to $C$} under the natural sign choice $x_C>0$ (appropriate for PageRank-type solutions).

Define the restricted candidate $\hat x$ by \begin{equation}\label{eq:xhat_def_gamma_param} \hat x_{\bar C}:=0, \qquad \hat x_C:=Q_{CC}^{-1}\bigl(b_C-\lambda_C\bigr), \qquad b:=\alpha D^{-1/2}s. \end{equation} Since $s=e_v$ and $v\in C$, the vector $b_C$ has exactly one nonzero entry: \begin{equation}\label{eq:bC_norm_gamma_param} \|b_C\|_2=\frac{\alpha}{\sqrt{d_v}}. \end{equation}

\begin{lemma}[Stationarity on $C$]\label{lem:restricted_stationarity_gamma_param} The vector $\hat x$ satisfies the exact stationarity equation on $C$: \begin{equation}\label{eq:restricted_stationarity_gamma_param} \nabla f(\hat x)_C+\lambda_C = 0. \end{equation} \end{lemma}

\begin{proof} Since $\nabla f(x)=Qx-b$, and $\hat x_{\bar C}=0$, we have \[ \nabla f(\hat x)_C = (Q_{CC}\hat x_C - b_C). \] By the definition \eqref{eq:xhat_def_gamma_param}, $Q_{CC}\hat x_C=b_C-\lambda_C$, hence $\nabla f(\hat x)_C = -\lambda_C$, which is \eqref{eq:restricted_stationarity_gamma_param}. \end{proof}

Next we bound the gradient on $\bar C$ in terms of conductance, internal connectivity, and outside degree.

\begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xC_norm_gamma_param} Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then \begin{equation}\label{eq:xC_norm_gamma_param} \|\hat x_C\|_2 \;\le\; \frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation} \end{lemma}

\begin{proof} Since $Q_{CC}$ is symmetric positive definite and $\lambda_{\min}(Q_{CC})=\mu_C$, \[ \|Q_{CC}^{-1}\|_2 = \frac{1}{\mu_C}. \] Using \eqref{eq:xhat_def_gamma_param} and the triangle inequality, \[ \|\hat x_C\|_2 = \left\|Q_{CC}^{-1}(b_C-\lambda_C)\right\|_2 \le \|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|\lambda_C\|_2\bigr). \] By \eqref{eq:bC_norm_gamma_param}, $\|b_C\|_2=\alpha/\sqrt{d_v}$. Also, by \eqref{eq:lambda_def_gamma_param}, \[ \|\lambda_C\|_2^2=\sum_{i\in C} (\rho\alpha\sqrt{d_i})^2 =(\rho\alpha)^2\sum_{i\in C} d_i =(\rho\alpha)^2\,\mathrm{vol}(C), \] so $\|\lambda_C\|_2=\rho\alpha\sqrt{\mathrm{vol}(C)}$. Substitute these bounds to obtain \eqref{eq:xC_norm_gamma_param}. \end{proof}

\begin{lemma}[Outside gradient bound from conductance and degrees]\label{lem:outside_grad_bound_gamma_param} Assume $v\in C$ so that $b_{\bar C}=0$. Then for each $i\in\bar C$, \begin{equation}\label{eq:outside_grad_pointwise_gamma_param} |\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\mathrm{cut}(C,\bar C)}{d_i\,d_{\mathrm{in}}}}. \end{equation} Consequently, using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$, \begin{equation}\label{eq:outside_grad_uniform_gamma_param} \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \frac{1-\alpha}{2}\,\|\hat x_C\|_2\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}. \end{equation} \end{lemma}

\begin{proof} Fix $i\in\bar C$. Since $\hat x_{\bar C}=0$ and $b_{\bar C}=0$, \[ \nabla f(\hat x)_i = (Q\hat x - b)_i = (Q_{\bar C C}\hat x_C)_i = \sum_{j\in C}Q_{ij}\hat x_j. \] From \eqref{eq:Q_def_gamma_param}, for $i\neq j$ we have \[ Q_{ij} = -\frac{1-\alpha}{2}(D^{-1/2}AD^{-1/2})_{ij} = \begin{cases} -\dfrac{1-\alpha}{2}\dfrac{1}{\sqrt{d_i d_j}}, & (i,j)\in E,\\[2mm] 0, & (i,j)\notin E. \end{cases} \] Thus \[ |\nabla f(\hat x)_i| = \left|\sum_{j\in C\cap N(i)} -\frac{1-\alpha}{2}\frac{\hat x_j}{\sqrt{d_i d_j}}\right| \le \frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i}} \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}. \] Apply Cauchy--Schwarz to the sum: \[ \sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}} \le \left(\sum_{j\in C\cap N(i)} \hat x_j^2\right)^{1/2} \left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2} \le \|\hat x_C\|_2\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}. \] Since $d_j\ge d_{\mathrm{in}}$ for all $j\in C$, we have \[ \sum_{j\in C\cap N(i)} \frac{1}{d_j} \le \frac{|C\cap N(i)|}{d_{\mathrm{in}}}. \] Finally, $|C\cap N(i)|$ is the number of cut edges incident to $i$, hence $|C\cap N(i)|\le \mathrm{cut}(C,\bar C)$, which gives \eqref{eq:outside_grad_pointwise_gamma_param}. The uniform bound \eqref{eq:outside_grad_uniform_gamma_param} follows by using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$. \end{proof}

\subsection{A parameterized margin lower bound and a $\sqrt{\rho}$ corollary}

Define the cluster-dependent quantity \begin{equation}\label{eq:DeltaC_def_gamma_param} \Delta(C)

=

\frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right). \end{equation}

\begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree] \label{thm:gamma_parameterized_by_cluster} Assume $v\in C$ and that the restricted candidate \eqref{eq:xhat_def_gamma_param} satisfies \begin{equation}\label{eq:xhat_positive_gamma_param} \hat x_C>0\quad\text{(componentwise)}. \end{equation} Then: \begin{enumerate} \item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique} global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$. \item[(ii)] Whenever $x^\star=\hat x$, the strict-complementarity margin satisfies the explicit lower bound \begin{equation}\label{eq:gamma_param_bound_gamma_param} \gamma = \min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr) \;\ge\; \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C). \end{equation} Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then \begin{equation}\label{eq:gamma_eta_bound_gamma_param} \gamma \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}. \end{equation} \end{enumerate} \end{theorem}

\begin{proof} We prove (i) and (ii) in order.

\medskip\noindent\textbf{Step 1: KKT conditions for \eqref{eq:ppr_obj_gamma_param}.} Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists $z\in\partial g(x)$ such that \begin{equation}\label{eq:KKT_gamma_param} \nabla f(x)+z=0. \end{equation} Moreover, $F$ is strongly convex because $f$ is $\alpha$-strongly convex and $g$ is convex, so the minimizer is unique.

\medskip\noindent\textbf{Step 2: Build a valid $z\in\partial g(\hat x)$ and verify \eqref{eq:KKT_gamma_param}.} Because of \eqref{eq:xhat_positive_gamma_param}, $\mathrm{sign}(\hat x_i)=+1$ for all $i\in C$. Define $z\in\mathbb{R}^n$ by \[ z_i := \begin{cases} \lambda_i, & i\in C,\\[1mm] -\nabla f(\hat x)_i, & i\in\bar C. \end{cases} \] We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$. Indeed: \begin{itemize} \item If $i\in C$, then $\hat x_i>0$ and the weighted-$\ell_1$ subgradient requires $z_i=\lambda_i$, which holds. \item If $i\in\bar C$, then $\hat x_i=0$ and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$. This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$. \end{itemize} Next, Lemma~\ref{lem:restricted_stationarity_gamma_param} gives $\nabla f(\hat x)_C=-\lambda_C$, so $\nabla f(\hat x)_C+z_C=0$. By definition, $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$. Thus \eqref{eq:KKT_gamma_param} holds.

\medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.} By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param}, \[ \max_{i\in\bar C}|\nabla f(\hat x)_i| \le \Delta(C). \] Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$, then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$, so $z\in\partial g(\hat x)$ and the KKT condition holds. Hence $\hat x$ is a minimizer, and by strong convexity it is the unique minimizer: $\hat x=x^\star$. This proves (i).

\medskip\noindent\textbf{Step 4: Lower bound the margin.} Assume $x^\star=\hat x$. Since $x^\star_{\bar C}=0$, the inactive set contains $\bar C$ and \[ \gamma = \min_{i\in \bar C}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) \ge \min_{i\in\bar C}\lambda_i - \max_{i\in\bar C}|\nabla f(\hat x)_i| = \rho\alpha\sqrt{d_{\mathrm{out}}} - \max_{i\in\bar C}|\nabla f(\hat x)_i|. \] Using $\max_{i\in\bar C}|\nabla f(\hat x)_i|\le \Delta(C)$ gives \eqref{eq:gamma_param_bound_gamma_param}. The $\eta$-form \eqref{eq:gamma_eta_bound_gamma_param} is immediate. \end{proof}

\begin{corollary}[A concrete $\gamma=\Omega(\sqrt{\rho})$ regime]\label{cor:gamma_Omega_sqrt_rho_regime} Assume the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster}. Suppose, in addition, that there are constants $V_0,d_0,\mu_0,c_0>0$ such that \[ \mathrm{vol}(C)\le \frac{V_0}{\rho},\qquad d_{\mathrm{in}}\ge d_0,\qquad d_v\ge d_0,\qquad \mu_C\ge \mu_0,\qquad d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho}. \] If furthermore the conductance satisfies \begin{equation}\label{eq:phi_rho_condition_gamma_param} \phi(C)\le c_1\rho \quad\text{for a sufficiently small constant $c_1$ (depending only on $V_0,d_0,\mu_0,c_0,\alpha$),} \end{equation} then there exists a constant $\eta\in(0,1)$ (depending only on the same parameters) such that \[ \gamma \ge \eta\,\sqrt{\rho}. \] \end{corollary}

\begin{proof} From $d_{\mathrm{out}}\ge c_0/(\alpha^2\rho)$ we get \[ \rho\alpha\sqrt{d_{\mathrm{out}}}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}. \] Next, using the assumed bounds in \eqref{eq:DeltaC_def_gamma_param}, \[ \Delta(C) \le \frac{1-\alpha}{2\mu_0}\, \sqrt{\frac{\phi(C)\cdot (V_0/\rho)}{(c_0/(\alpha^2\rho))\cdot d_0}}\, \left(\frac{\alpha}{\sqrt{d_0}}+\rho\alpha\sqrt{V_0/\rho}\right) = \frac{1-\alpha}{2\mu_0}\, \alpha\,\sqrt{\frac{V_0}{c_0 d_0}}\, \sqrt{\phi(C)}\, \left(\frac{\alpha}{\sqrt{d_0}}+\alpha\sqrt{V_0}\sqrt{\rho}\right). \] Under \eqref{eq:phi_rho_condition_gamma_param}, $\sqrt{\phi(C)}\le \sqrt{c_1}\sqrt{\rho}$, hence $\Delta(C)\le K\sqrt{\rho}$ for a constant $K$ depending only on $(V_0,d_0,\mu_0,c_0,\alpha)$ and $c_1$. Choosing $c_1$ sufficiently small ensures $\Delta(C)\le (1-\eta)\sqrt{c_0}\sqrt{\rho}$ for some $\eta\in(0,1)$. Applying \eqref{eq:gamma_param_bound_gamma_param} then yields $\gamma\ge \eta\sqrt{c_0}\sqrt{\rho}$. Absorbing $\sqrt{c_0}$ into $\eta$ gives the stated $\gamma\ge \eta\sqrt{\rho}$. \end{proof}

\begin{remark}[Interpretation] The quantity $\Delta(C)$ in \eqref{eq:DeltaC_def_gamma_param} is explicit and shows how the margin depends on: (i) the \emph{outside degree} $d_{\mathrm{out}}$ through the threshold $\lambda_i=\rho\alpha\sqrt{d_i}$, (ii) the \emph{boundary size} through $\phi(C)$, and (iii) the \emph{internal conditioning} $\mu_C=\lambda_{\min}(Q_{CC})$. When $\phi(C)=O(\rho)$, $d_{\mathrm{out}}=\Theta(1/(\alpha^2\rho))$, and $\mu_C$ is bounded below, the bound forces $\gamma=\Omega(\sqrt{\rho})$. \end{remark} % ============================================================"

Assistant: Yes — let’s make this mechanical.[edit | edit source]

Yes — let’s make this mechanical. You don’t need to guess what to change: you will (A) replace the definition of x^\hat xx^, (B) delete one lemma, (C) add two lemmas, (D) replace one lemma’s proof, and (E) replace the theorem+proof (and delete the positivity assumption).

Below I refer to the exact blocks in the code you pasted.

###[edit | edit source]

It starts at:

We will construct a candidate vector supported on $C$ by solving the KKT stationarity equations \emph{restricted to $C$}
under the natural sign choice $x_C>0$ (appropriate for PageRank-type solutions).

Define the restricted candidate $\hat x$ by
\begin{equation}\label{eq:xhat_def_gamma_param}
\hat x_{\bar C}:=0,
\qquad
\hat x_C:=Q_{CC}^{-1}\bigl(b_C-\lambda_C\bigr),
\qquad
b:=\alpha D^{-1/2}s.
\end{equation}
(copy-paste exactly):[edit | edit source]
We will construct a candidate vector supported on $C$ as the \emph{restricted minimizer}
of the full objective over vectors with $x_{\bar C}=0$.
This avoids imposing (and later proving) a sign pattern such as $\hat x_C>0$.

Let $b:=\alpha D^{-1/2}s$. Define $\hat x$ to be the unique minimizer of
\begin{equation}\label{eq:xhat_def_gamma_param}
\hat x \in \arg\min_{x\in\mathbb{R}^n:\; x_{\bar C}=0}\;
\left\{
\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s
\;+\;
\sum_{i=1}^n \lambda_i |x_i|
\right\}.
\end{equation}
Equivalently, $\hat x_{\bar C}=0$ and $\hat x_C$ uniquely minimizes
\[
\min_{x_C\in\mathbb{R}^{|C|}}
\left\{
\frac12 x_C^\top Q_{CC}x_C - b_C^\top x_C + \sum_{i\in C}\lambda_i|x_i|
\right\}.
\]

Keep your equation for ∥bC∥2\|b_C\|_2∥bC​∥2​ (eq:bC_norm_gamma_param) right after this (no change).

###[edit | edit source]

Remove everything from:

\begin{lemma}[Stationarity on $C$]\label{lem:restricted_stationarity_gamma_param}
...
\end{proof}

That lemma relied on the closed-form formula QCC−1(bC−λC)Q_{CC}^{-1}(b_C-\lambda_C)QCC−1​(bC​−λC​), which is not the restricted minimizer once the ℓ1\ell_1ℓ1​ term can create zeros/sign changes inside CCC.

Paste the following exactly where the stationarity lemma used to be:[edit | edit source]

\begin{lemma}[Absolute-value improvement and nonnegativity]\label{lem:abs_improvement_nonneg}
Assume $v\in C$ so that $b_C\ge 0$ and $b_{\bar C}=0$.
Then for every $x$ with $x_{\bar C}=0$, the componentwise absolute value $|x|$ satisfies
\[
F(|x|)\le F(x).
\]
Consequently, the restricted minimizer $\hat x$ satisfies $\hat x_C\ge 0$ componentwise.
\end{lemma}

\begin{proof}
Fix any $x$ with $x_{\bar C}=0$ and set $y:=|x|$ (so $y_{\bar C}=0$).
Write
\[
F(x)=\frac12 x^\top Qx - b^\top x + \sum_{i=1}^n \lambda_i|x_i|.
\]

\emph{Quadratic term.}
Since $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$, we have $Q_{ij}\le 0$ for $i\neq j$.
Expand
\[
x^\top Qx = \sum_i Q_{ii}x_i^2 + 2\sum_{i<j}Q_{ij}x_i x_j.
\]
For each $i<j$, $x_i x_j\le |x_i||x_j|=y_i y_j$, and since $Q_{ij}\le 0$ this implies
$Q_{ij}x_i x_j \ge Q_{ij}y_i y_j$.
Hence $x^\top Qx \ge y^\top Qy$.

\emph{Linear term.}
Since $b\ge 0$ (here $b=\alpha D^{-1/2}s$ with $s=e_v$) and $y_i=|x_i|\ge x_i$ for all $i$,
we have $-b^\top y \le -b^\top x$.

\emph{$\ell_1$ term.}
$\sum_i \lambda_i|y_i|=\sum_i\lambda_i|x_i|$ by definition.

Combining the three comparisons gives $F(y)\le F(x)$.
Now let $\hat x$ be the unique restricted minimizer. Then $F(|\hat x|)\le F(\hat x)$, so $|\hat x|$ is also a minimizer.
By uniqueness, $|\hat x|=\hat x$, hence $\hat x_C\ge 0$.
\end{proof}

\begin{lemma}[Restricted KKT on $C$]\label{lem:restricted_KKT_on_C}
There exists a vector $z_C\in\mathbb{R}^{|C|}$ such that
\begin{equation}\label{eq:restricted_KKT_on_C}
Q_{CC}\hat x_C - b_C + z_C = 0,
\end{equation}
and for every $i\in C$,
\begin{equation}\label{eq:zC_in_subdiff}
z_{C,i}\in \lambda_i\,\partial|\hat x_i| \subseteq [-\lambda_i,\lambda_i].
\end{equation}
In particular, $\|z_C\|_2 \le \|\lambda_C\|_2 = \rho\alpha\sqrt{\mathrm{vol}(C)}$.
\end{lemma}

\begin{proof}
The restricted objective over $x_C$ is strongly convex because $Q_{CC}\succ 0$ and the $\ell_1$ term is convex,
so $\hat x_C$ is unique. First-order optimality yields
\[
0\in Q_{CC}\hat x_C - b_C + \partial\!\Big(\sum_{i\in C}\lambda_i|\hat x_i|\Big),
\]
which is exactly the existence of $z_C$ satisfying \eqref{eq:restricted_KKT_on_C}--\eqref{eq:zC_in_subdiff}.
The norm bound follows from $|z_{C,i}|\le \lambda_i$ and
$\|\lambda_C\|_2^2=\sum_{i\in C}(\rho\alpha\sqrt{d_i})^2=(\rho\alpha)^2\mathrm{vol}(C)$.
\end{proof}

You can keep the statement and label exactly the same; only change the proof so it no longer uses[edit | edit source]

x^C=QCC−1(bC−λC)\hat x_C=Q_{CC}^{-1}(b_C-\lambda_C)x^C​=QCC−1​(bC​−λC​).

=====

\begin{proof} =====
From Lemma~\ref{lem:restricted_KKT_on_C}, $Q_{CC}\hat x_C=b_C-z_C$, so
\[
\|\hat x_C\|_2 \le \|Q_{CC}^{-1}\|_2\,\|b_C-z_C\|_2
\le \|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|z_C\|_2\bigr).
\]
Since $\lambda_{\min}(Q_{CC})=\mu_C$, we have $\|Q_{CC}^{-1}\|_2=1/\mu_C$.
Also, by \eqref{eq:bC_norm_gamma_param}, $\|b_C\|_2=\alpha/\sqrt{d_v}$.
Finally, Lemma~\ref{lem:restricted_KKT_on_C} gives $\|z_C\|_2\le \rho\alpha\sqrt{\mathrm{vol}(C)}$.
Substituting yields \eqref{eq:xC_norm_gamma_param}.
\end{proof}

Everything else in that lemma (statement and equation eq:xC_norm_gamma_param) stays unchanged.

###[edit | edit source]

Remove these lines entirely:

Assume $v\in C$ and that the restricted candidate \eqref{eq:xhat_def_gamma_param} satisfies
\begin{equation}\label{eq:xhat_positive_gamma_param}
\hat x_C>0\quad\text{(componentwise)}.
\end{equation}
Replace everything from:[edit | edit source]
\begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree]
\label{thm:gamma_parameterized_by_cluster}
...
\end{proof}

with the following (copy-paste):

\begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree]
\label{thm:gamma_parameterized_by_cluster}
Assume $v\in C$ and let $\hat x$ be defined by \eqref{eq:xhat_def_gamma_param} (restricted minimizer with $\hat x_{\bar C}=0$).
Then:
\begin{enumerate}
\item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique}
global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$.
\item[(ii)] Whenever $x^\star=\hat x$, the \emph{outside} slack (margin on $\bar C$) satisfies the explicit lower bound
\begin{equation}\label{eq:gamma_param_bound_gamma_param}
\min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr)
\;\ge\;
\rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C).
\end{equation}
Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then
\begin{equation}\label{eq:gamma_eta_bound_gamma_param}
\min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr)
\ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}.
\end{equation}
\end{enumerate}
\end{theorem}

\begin{proof}
We prove (i) and (ii) in order.

\medskip\noindent\textbf{Step 1: KKT conditions for \eqref{eq:ppr_obj_gamma_param}.}
Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists
$z\in\partial g(x)$ such that
\begin{equation}\label{eq:KKT_gamma_param}
\nabla f(x)+z=0.
\end{equation}
Moreover, $F$ is strongly convex because $f$ is $\alpha$-strongly convex and $g$ is convex,
so the minimizer is unique.

\medskip\noindent\textbf{Step 2: Build a valid $z\in\partial g(\hat x)$ and verify \eqref{eq:KKT_gamma_param}.}
From Lemma~\ref{lem:restricted_KKT_on_C}, there exists $z_C$ such that
$Q_{CC}\hat x_C-b_C+z_C=0$ and $z_{C,i}\in[-\lambda_i,\lambda_i]$ for all $i\in C$.

Define $z\in\mathbb{R}^n$ by
\[
z_i :=
\begin{cases}
(z_C)_i, & i\in C,\\[1mm]
-\nabla f(\hat x)_i, & i\in\bar C.
\end{cases}
\]
We claim $z\in\partial g(\hat x)$ provided $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$.
Indeed:
\begin{itemize}
\item If $i\in C$, then $z_{C,i}\in \lambda_i\,\partial|\hat x_i|$ by Lemma~\ref{lem:restricted_KKT_on_C},
so it is a valid weighted-$\ell_1$ subgradient component on $C$.
\item If $i\in\bar C$, then $\hat x_i=0$ and the subgradient condition is $z_i\in[-\lambda_i,\lambda_i]$.
This holds if $|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$.
\end{itemize}

Next we check \eqref{eq:KKT_gamma_param}.
On $C$, since $\hat x_{\bar C}=0$,
\[
\nabla f(\hat x)_C = (Q\hat x-b)_C = Q_{CC}\hat x_C - b_C,
\]
hence $\nabla f(\hat x)_C+z_C=0$ by $Q_{CC}\hat x_C-b_C+z_C=0$.
On $\bar C$, by construction $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$.
Therefore $\nabla f(\hat x)+z=0$.

\medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.}
By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param},
\[
\max_{i\in\bar C}|\nabla f(\hat x)_i|
\le
\Delta(C).
\]
Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$,
then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$, so indeed $z\in\partial g(\hat x)$ and the KKT condition holds.
Hence $\hat x$ is a minimizer, and by strong convexity it is the unique minimizer:
$\hat x=x^\star$. This proves (i).

\medskip\noindent\textbf{Step 4: Lower bound the outside margin on $\bar C$.}
Assume $x^\star=\hat x$. Then for every $i\in\bar C$,
\[
\lambda_i-|\nabla f(x^\star)_i|
=\lambda_i-|\nabla f(\hat x)_i|
\ge \lambda_i-\Delta(C)
\ge \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C),
\]
where we used $\lambda_i\ge \rho\alpha\sqrt{d_{\mathrm{out}}}$.
Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_param_bound_gamma_param},
and the $\eta$-form \eqref{eq:gamma_eta_bound_gamma_param} follows immediately.
\end{proof}

✅ After this change, you no longer need to prove x^C>0\hat x_C>0x^C​>0; the theorem is correct with only x^C≥0\hat x_C\ge 0x^C​≥0 (which we proved in Lemma lem:abs_improvement_nonneg), and it yields the parameterized slack bound on Cˉ\bar CCˉ (which is what your conductance/outside-degree analysis controls).

This updated theorem lower bounds the slack for nodes in Cˉ\bar CCˉ. It does not claim that every node in CCC is nonzero; that cannot be guaranteed just from ϕ(C)\phi(C)ϕ(C), μC\mu_CμC​, doutd_{\mathrm{out}}dout​ (the ℓ1\ell_1ℓ1​ term can still zero out interior nodes).[edit | edit source]

If you want, I can show you the minimal additional “interior beta-min” condition that makes the global margin coincide with this outside margin.

(And yes: this whole section is for objective (3) in your note. open-problem-fountoulakis22a

)

User: Sorry this is a bit messed up.[edit | edit source]

Sorry this is a bit messed up. Can you simply give me a single latex snippet that replaces the whole "\section{A conductance--connectivity--degree lower bound on the margin}" appropriately? I don't want to be copy-pasting pieces here and there.

Assistant: Below is a single self-contained LaTeX snippet that you can paste in your file to replace the entire section starting at[edit | edit source]

Below is a single self-contained LaTeX snippet that you can paste in your file to replace the entire section starting at

\section{A conductance--connectivity--degree lower bound on the margin}

and ending at its closing remark.

This version does not assume \hat x_C>0 anywhere. Instead, it defines x^\hat xx^ as the restricted minimizer on CCC, proves x^C≥0\hat x_C\ge 0x^C​≥0, builds the global KKT certificate using the restricted KKT subgradient, and then derives an explicit parameterized lower bound on the outside margin on Cˉ\bar CCˉ in terms of ϕ(C)\phi(C)ϕ(C), μC\mu_CμC​, and doutd_{\mathrm{out}}dout​. It also includes a corollary giving an Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ​) regime.

% ============================================================
\section{A conductance--connectivity--degree lower bound on the margin}
\label{sec:gamma_conductance_connectivity_degree}

We specialize to the $\ell_1$-regularized PageRank objective~(3):
\begin{equation}\label{eq:ppr_obj_gamma_param}
\min_{x\in\mathbb{R}^n}
F(x)
:=
\underbrace{\frac12 x^\top Qx-\alpha x^\top D^{-1/2}s}_{=:f(x)}
\;+\;
\underbrace{\rho\alpha\|D^{1/2}x\|_1}_{=:g(x)}.
\end{equation}
Here $G=(V,E)$ is undirected and unweighted, $A$ is the adjacency matrix, $D$ is the diagonal degree matrix,
$s=e_v$ is a seed at node $v$, and
\begin{equation}\label{eq:Q_def_gamma_param}
Q=\alpha I+\frac{1-\alpha}{2}L
=
\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}\,D^{-1/2}AD^{-1/2}.
\end{equation}
The weighted $\ell_1$ penalty has coordinate weights
\begin{equation}\label{eq:lambda_def_gamma_param}
g(x)=\sum_{i=1}^n \lambda_i |x_i|,
\qquad
\lambda_i:=\rho\alpha\sqrt{d_i}.
\end{equation}
Since $Q\succeq \alpha I$, the function $f$ is $\alpha$-strongly convex and hence $F=f+g$ has a unique minimizer $x^\star$.

\paragraph{Cluster notation.}
Fix a set (candidate cluster) $C\subseteq V$ with $v\in C$, and let $\bar C:=V\setminus C$.
Define the volume and cut:
\[
\mathrm{vol}(C):=\sum_{i\in C} d_i,
\qquad
\mathrm{cut}(C,\bar C):=|\{(i,j)\in E:\; i\in C,\ j\in\bar C\}|.
\]
We use the (one-sided) conductance
\[
\phi(C):=\frac{\mathrm{cut}(C,\bar C)}{\mathrm{vol}(C)}.
\]
Let
\[
d_{\mathrm{in}}:=\min_{j\in C} d_j,
\qquad
d_{\mathrm{out}}:=\min_{i\in\bar C} d_i,
\qquad
d_v:=d_{v}.
\]
We write $N(i)$ for the neighbor set of node $i$.

\paragraph{Internal connectivity parameter.}
Let
\begin{equation}\label{eq:muC_def_gamma_param}
\mu_C:=\lambda_{\min}(Q_{CC}),
\end{equation}
the smallest eigenvalue of the principal submatrix $Q_{CC}$.

\subsection{Restricted minimizer on $C$, its nonnegativity, and a norm bound}

Define $b:=\alpha D^{-1/2}s$. Since $s=e_v$ and $v\in C$, the vector $b_C$ has exactly one nonzero entry and
\begin{equation}\label{eq:bC_norm_gamma_param}
\|b_C\|_2=\frac{\alpha}{\sqrt{d_v}}.
\end{equation}

\begin{definition}[Restricted minimizer on $C$]\label{def:xhat_restricted_gamma_param}
Let $\hat x$ denote the unique minimizer of \eqref{eq:ppr_obj_gamma_param} subject to $x_{\bar C}=0$:
\begin{equation}\label{eq:xhat_def_gamma_param}
\hat x \in \operatorname*{argmin}_{x\in\mathbb{R}^n:\; x_{\bar C}=0}\; F(x).
\end{equation}
Equivalently, $\hat x_{\bar C}=0$ and $\hat x_C$ uniquely minimizes
\[
\min_{x_C\in\mathbb{R}^{|C|}}
\left\{
\frac12 x_C^\top Q_{CC}x_C - b_C^\top x_C + \sum_{i\in C}\lambda_i|x_i|
\right\}.
\]
\end{definition}

\begin{lemma}[Absolute-value improvement and nonnegativity]\label{lem:abs_improvement_nonneg_gamma_param}
Assume $v\in C$ so that $b_C\ge 0$ and $b_{\bar C}=0$.
Then for every $x$ with $x_{\bar C}=0$, the componentwise absolute value $|x|$ satisfies
\[
F(|x|)\le F(x).
\]
Consequently, the restricted minimizer satisfies $\hat x_C\ge 0$ componentwise.
\end{lemma}

\begin{proof}
Fix any $x$ with $x_{\bar C}=0$ and set $y:=|x|$ (so $y_{\bar C}=0$).
Write
\[
F(x)=\frac12 x^\top Qx - b^\top x + \sum_{i=1}^n \lambda_i|x_i|.
\]

\medskip\noindent\textbf{Quadratic term.}
Since $Q=\frac{1+\alpha}{2}I-\frac{1-\alpha}{2}D^{-1/2}AD^{-1/2}$, we have $Q_{ij}\le 0$ for $i\neq j$.
Expand
\[
x^\top Qx = \sum_i Q_{ii}x_i^2 + 2\sum_{i<j}Q_{ij}x_i x_j.
\]
For each $i<j$, we have $x_i x_j\le |x_i||x_j|=y_i y_j$, and since $Q_{ij}\le 0$ this implies
$Q_{ij}x_i x_j \ge Q_{ij}y_i y_j$.
Hence $x^\top Qx \ge y^\top Qy$.

\medskip\noindent\textbf{Linear term.}
Since $b\ge 0$ and $y_i=|x_i|\ge x_i$ for all $i$, we have $-b^\top y \le -b^\top x$.

\medskip\noindent\textbf{$\ell_1$ term.}
$\sum_i \lambda_i|y_i|=\sum_i\lambda_i|x_i|$ by definition.

\medskip\noindent
Combining the three comparisons gives $F(y)\le F(x)$.
Now let $\hat x$ be the unique restricted minimizer. Then $F(|\hat x|)\le F(\hat x)$, so $|\hat x|$ is also a minimizer.
By uniqueness, $|\hat x|=\hat x$, hence $\hat x_C\ge 0$.
\end{proof}

\begin{lemma}[Restricted KKT on $C$]\label{lem:restricted_KKT_on_C_gamma_param}
There exists a vector $z_C\in\mathbb{R}^{|C|}$ such that
\begin{equation}\label{eq:restricted_KKT_on_C_gamma_param}
Q_{CC}\hat x_C - b_C + z_C = 0,
\end{equation}
and for every $i\in C$,
\begin{equation}\label{eq:zC_in_subdiff_gamma_param}
z_{C,i}\in \lambda_i\,\partial|\hat x_i| \subseteq [-\lambda_i,\lambda_i].
\end{equation}
In particular, $\|z_C\|_2 \le \|\lambda_C\|_2 = \rho\alpha\sqrt{\mathrm{vol}(C)}$.
\end{lemma}

\begin{proof}
The restricted objective over $x_C$ is strongly convex because $Q_{CC}\succ 0$ and the $\ell_1$ term is convex,
so $\hat x_C$ is unique. First-order optimality yields
\[
0\in Q_{CC}\hat x_C - b_C + \partial\!\Big(\sum_{i\in C}\lambda_i|\hat x_i|\Big),
\]
which is exactly the existence of $z_C$ satisfying \eqref{eq:restricted_KKT_on_C_gamma_param}--\eqref{eq:zC_in_subdiff_gamma_param}.
The norm bound follows from $|z_{C,i}|\le \lambda_i$ and
$\|\lambda_C\|_2^2=\sum_{i\in C}(\rho\alpha\sqrt{d_i})^2=(\rho\alpha)^2\mathrm{vol}(C)$.
\end{proof}

\begin{lemma}[Bounding $\|\hat x_C\|_2$ by internal connectivity]\label{lem:xC_norm_gamma_param}
Let $\mu_C=\lambda_{\min}(Q_{CC})$. Then
\begin{equation}\label{eq:xC_norm_gamma_param}
\|\hat x_C\|_2
\;\le\;
\frac{1}{\mu_C}\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right).
\end{equation}
\end{lemma}

\begin{proof}
From Lemma~\ref{lem:restricted_KKT_on_C_gamma_param}, $Q_{CC}\hat x_C=b_C-z_C$, so
\[
\|\hat x_C\|_2 \le \|Q_{CC}^{-1}\|_2\,\|b_C-z_C\|_2
\le \|Q_{CC}^{-1}\|_2\bigl(\|b_C\|_2+\|z_C\|_2\bigr).
\]
Since $\lambda_{\min}(Q_{CC})=\mu_C$, we have $\|Q_{CC}^{-1}\|_2=1/\mu_C$.
Also, $\|b_C\|_2=\alpha/\sqrt{d_v}$ by \eqref{eq:bC_norm_gamma_param}, and
$\|z_C\|_2\le \rho\alpha\sqrt{\mathrm{vol}(C)}$ by Lemma~\ref{lem:restricted_KKT_on_C_gamma_param}.
Substituting yields \eqref{eq:xC_norm_gamma_param}.
\end{proof}

\subsection{A pointwise outside-gradient bound from conductance and degrees}

\begin{lemma}[Outside gradient bound from conductance and degrees]\label{lem:outside_grad_bound_gamma_param}
Assume $v\in C$ so that $b_{\bar C}=0$. Then for each $i\in\bar C$,
\begin{equation}\label{eq:outside_grad_pointwise_gamma_param}
|\nabla f(\hat x)_i|
\le
\frac{1-\alpha}{2}\,\|\hat x_C\|_2\,
\sqrt{\frac{\mathrm{cut}(C,\bar C)}{d_i\,d_{\mathrm{in}}}}.
\end{equation}
Consequently, using $d_i\ge d_{\mathrm{out}}$ and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$,
\begin{equation}\label{eq:outside_grad_uniform_gamma_param}
\max_{i\in\bar C}|\nabla f(\hat x)_i|
\le
\frac{1-\alpha}{2}\,\|\hat x_C\|_2\,
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}.
\end{equation}
\end{lemma}

\begin{proof}
Fix $i\in\bar C$. Since $\hat x_{\bar C}=0$ and $b_{\bar C}=0$,
\[
\nabla f(\hat x)_i = (Q\hat x - b)_i = (Q_{\bar C C}\hat x_C)_i = \sum_{j\in C}Q_{ij}\hat x_j.
\]
From \eqref{eq:Q_def_gamma_param}, for $i\neq j$ we have
\[
Q_{ij} = -\frac{1-\alpha}{2}(D^{-1/2}AD^{-1/2})_{ij}
=
\begin{cases}
-\dfrac{1-\alpha}{2}\dfrac{1}{\sqrt{d_i d_j}}, & (i,j)\in E,\\[2mm]
0, & (i,j)\notin E.
\end{cases}
\]
Thus
\[
|\nabla f(\hat x)_i|
=
\left|\sum_{j\in C\cap N(i)} -\frac{1-\alpha}{2}\frac{\hat x_j}{\sqrt{d_i d_j}}\right|
\le
\frac{1-\alpha}{2}\cdot \frac{1}{\sqrt{d_i}}
\sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}.
\]
Apply Cauchy--Schwarz:
\[
\sum_{j\in C\cap N(i)}\frac{|\hat x_j|}{\sqrt{d_j}}
\le
\left(\sum_{j\in C\cap N(i)} \hat x_j^2\right)^{1/2}
\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}
\le
\|\hat x_C\|_2\left(\sum_{j\in C\cap N(i)} \frac{1}{d_j}\right)^{1/2}.
\]
Since $d_j\ge d_{\mathrm{in}}$ for all $j\in C$,
\[
\sum_{j\in C\cap N(i)} \frac{1}{d_j}
\le
\frac{|C\cap N(i)|}{d_{\mathrm{in}}}.
\]
Finally, $|C\cap N(i)|$ is the number of cut edges incident to $i$, hence $|C\cap N(i)|\le \mathrm{cut}(C,\bar C)$,
which gives \eqref{eq:outside_grad_pointwise_gamma_param}. The uniform bound
\eqref{eq:outside_grad_uniform_gamma_param} follows by using $d_i\ge d_{\mathrm{out}}$
and $\mathrm{cut}(C,\bar C)=\phi(C)\mathrm{vol}(C)$.
\end{proof}

\subsection{A parameterized outside-margin lower bound and a $\sqrt{\rho}$ regime}

Define the cluster-dependent quantity
\begin{equation}\label{eq:DeltaC_def_gamma_param}
\Delta(C)
:=
\frac{1-\alpha}{2\mu_C}\,
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\,
\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right).
\end{equation}

\begin{theorem}[Margin bound parameterized by conductance, internal connectivity, and outside degree]
\label{thm:gamma_parameterized_by_cluster}
Assume $v\in C$ and let $\hat x$ be the restricted minimizer \eqref{eq:xhat_def_gamma_param}.
Define the \emph{outside margin} on the complement
\[
\gamma_{\bar C}
:=
\min_{i\in \bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr).
\]
Then:
\begin{enumerate}
\item[(i)] If $\Delta(C)<\min_{i\in\bar C}\lambda_i=\rho\alpha\sqrt{d_{\mathrm{out}}}$, then $\hat x$ is the \emph{unique}
global minimizer of \eqref{eq:ppr_obj_gamma_param}. In particular, $x^\star=\hat x$ and $x^\star_{\bar C}=0$.
\item[(ii)] Whenever $x^\star=\hat x$, we have the explicit bound
\begin{equation}\label{eq:gamma_param_bound_gamma_param}
\gamma_{\bar C}
\;\ge\;
\rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C).
\end{equation}
Equivalently, if $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$ for some $\eta\in(0,1)$, then
\begin{equation}\label{eq:gamma_eta_bound_gamma_param}
\gamma_{\bar C} \ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}.
\end{equation}
\end{enumerate}
\end{theorem}

\begin{proof}
We prove (i) and (ii) in order.

\medskip\noindent\textbf{Step 1: KKT conditions.}
Since $f$ is differentiable and $g(x)=\sum_i \lambda_i|x_i|$, a point $x$ is optimal if and only if there exists
$z\in\partial g(x)$ such that $\nabla f(x)+z=0$.
Moreover, $F$ is strongly convex (because $Q\succeq \alpha I$), hence the minimizer is unique.

\medskip\noindent\textbf{Step 2: Build a global KKT certificate from the restricted minimizer.}
From Lemma~\ref{lem:restricted_KKT_on_C_gamma_param}, there exists $z_C$ such that
\[
Q_{CC}\hat x_C-b_C+z_C=0
\quad\text{and}\quad
z_{C,i}\in[-\lambda_i,\lambda_i]\;\; \forall i\in C.
\]
Define $z\in\mathbb{R}^n$ by
\[
z_i :=
\begin{cases}
(z_C)_i, & i\in C,\\[1mm]
-\nabla f(\hat x)_i, & i\in\bar C.
\end{cases}
\]
We claim $z\in\partial g(\hat x)$ if $|\nabla f(\hat x)_i|\le \lambda_i$ for all $i\in\bar C$.
Indeed:
\begin{itemize}
\item If $i\in C$, then $z_{C,i}\in \lambda_i\,\partial|\hat x_i|$ by \eqref{eq:zC_in_subdiff_gamma_param}, hence it is valid.
\item If $i\in\bar C$, then $\hat x_i=0$ and $z_i\in[-\lambda_i,\lambda_i]$ holds if
$|z_i|=|\nabla f(\hat x)_i|\le \lambda_i$.
\end{itemize}

Now check $\nabla f(\hat x)+z=0$.
On $C$, since $\hat x_{\bar C}=0$,
\[
\nabla f(\hat x)_C=(Q\hat x-b)_C = Q_{CC}\hat x_C - b_C,
\]
so $\nabla f(\hat x)_C+z_C=0$ by $Q_{CC}\hat x_C-b_C+z_C=0$.
On $\bar C$, $\nabla f(\hat x)_{\bar C}+z_{\bar C}=0$ by definition.
Thus $\nabla f(\hat x)+z=0$.

\medskip\noindent\textbf{Step 3: Enforce the outside inequalities via $\Delta(C)$.}
By Lemma~\ref{lem:outside_grad_bound_gamma_param} and Lemma~\ref{lem:xC_norm_gamma_param},
\[
\max_{i\in\bar C}|\nabla f(\hat x)_i|
\le
\Delta(C).
\]
Therefore, if $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}\le \lambda_i$ for all $i\in\bar C$,
then $|\nabla f(\hat x)_i|<\lambda_i$ on $\bar C$ and hence $z\in\partial g(\hat x)$.
Thus $0\in\partial F(\hat x)$, so $\hat x$ is a minimizer.
By uniqueness, $\hat x=x^\star$, proving (i).

\medskip\noindent\textbf{Step 4: Lower bound the outside margin.}
Assume $x^\star=\hat x$. Then for every $i\in\bar C$,
\[
\lambda_i-|\nabla f(x^\star)_i|
=\lambda_i-|\nabla f(\hat x)_i|
\ge \lambda_i-\Delta(C)
\ge \rho\alpha\sqrt{d_{\mathrm{out}}}-\Delta(C),
\]
where we used $\lambda_i\ge \rho\alpha\sqrt{d_{\mathrm{out}}}$.
Taking the minimum over $i\in\bar C$ yields \eqref{eq:gamma_param_bound_gamma_param}, and \eqref{eq:gamma_eta_bound_gamma_param} follows.
\end{proof}

\begin{corollary}[A concrete $\gamma_{\bar C}=\Omega(\sqrt{\rho})$ regime]\label{cor:gamma_Omega_sqrt_rho_regime}
Assume the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster}.
Suppose, in addition, that there are constants $V_0,d_0,\mu_0,c_0>0$ such that
\[
\mathrm{vol}(C)\le \frac{V_0}{\rho},\qquad
d_{\mathrm{in}}\ge d_0,\qquad
d_v\ge d_0,\qquad
\mu_C\ge \mu_0,\qquad
d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho}.
\]
If furthermore the conductance satisfies
\begin{equation}\label{eq:phi_rho_condition_gamma_param}
\phi(C)\le c_1\rho
\quad\text{for a sufficiently small constant $c_1$ (depending only on $V_0,d_0,\mu_0,c_0,\alpha$),}
\end{equation}
then there exists a constant $\eta\in(0,1)$ (depending only on the same parameters) such that
\[
\gamma_{\bar C} \ge \eta\,\sqrt{\rho}.
\]
\end{corollary}

\begin{proof}
First, from $d_{\mathrm{out}}\ge c_0/(\alpha^2\rho)$ we get
\[
\rho\alpha\sqrt{d_{\mathrm{out}}}\ge \rho\alpha\sqrt{\frac{c_0}{\alpha^2\rho}}=\sqrt{c_0}\,\sqrt{\rho}.
\]
Next, using the assumed bounds in \eqref{eq:DeltaC_def_gamma_param},
\begin{align*}
\Delta(C)
&\le
\frac{1-\alpha}{2\mu_0}\,
\sqrt{\frac{\phi(C)\cdot (V_0/\rho)}{(c_0/(\alpha^2\rho))\cdot d_0}}\,
\left(\frac{\alpha}{\sqrt{d_0}}+\rho\alpha\sqrt{V_0/\rho}\right)\\
&=
\frac{1-\alpha}{2\mu_0}\,
\alpha\,\sqrt{\frac{V_0}{c_0 d_0}}\,
\sqrt{\phi(C)}\,
\left(\frac{\alpha}{\sqrt{d_0}}+\alpha\sqrt{V_0}\sqrt{\rho}\right).
\end{align*}
Under \eqref{eq:phi_rho_condition_gamma_param}, $\sqrt{\phi(C)}\le \sqrt{c_1}\sqrt{\rho}$, hence
$\Delta(C)\le K\sqrt{c_1}\sqrt{\rho}$ for a constant $K$ depending only on $(V_0,d_0,\mu_0,c_0,\alpha)$.
Choosing $c_1$ sufficiently small ensures $\Delta(C)\le (1-\eta)\sqrt{c_0}\sqrt{\rho}$ for some $\eta\in(0,1)$, i.e.
\[
\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}.
\]
Apply \eqref{eq:gamma_eta_bound_gamma_param} to conclude $\gamma_{\bar C}\ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}}
\ge \eta\sqrt{c_0}\sqrt{\rho}$, and absorb $\sqrt{c_0}$ into $\eta$.
\end{proof}

\begin{remark}[Outside margin vs.\ the global margin $\gamma$]
Theorem~\ref{thm:gamma_parameterized_by_cluster} lower bounds the margin on $\bar C$.
The \emph{global} strict-complementarity margin $\gamma=\min_{i\in I^\star}(\lambda_i-|\nabla f(x^\star)_i|)$
could be smaller if some coordinates inside $C$ are also zero at $x^\star$.
To deduce the same lower bound for the global $\gamma$, it suffices (but is not implied by conductance alone)
to additionally have $x^\star_i>0$ for all $i\in C$, i.e.\ $I^\star=\bar C$.
\end{remark}
% ============================================================

If you paste this in place of your current section, it should compile cleanly and removes the “mystery assumption” x^C>0\hat x_C>0x^C​>0 entirely.

User: Can you solve "\DeltaC\le 1-\eta\rho\alpha\sqrt{d{\mathrm{out}}}" as a function of φC?[edit | edit source]

Can you solve "\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}" as a function of φ(C)? In other words, how small should conductance be?

Assistant: Starting from the definition of ΔC\DeltaCΔC in that section specialized to the ℓ1\ell1ℓ1​-regularized PPR objective 3 o…[edit | edit source]

Starting from the definition of Δ(C)\Delta(C)Δ(C) in that section (specialized to the ℓ1\ell_1ℓ1​-regularized PPR objective (3) open-problem-fountoulakis22a

),

Δ(C)=1−α2μC ϕ(C) vol(C)dout din (αdv+ραvol(C)),\Delta(C) = \frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\,\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right),Δ(C)=2μC​1−α​dout​din​ϕ(C)vol(C)​​(dv​​α​+ραvol(C)​), you want to solve the inequality

Δ(C)≤(1−η)ραdout(η∈(0,1)).\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}} \qquad(\eta\in(0,1)).Δ(C)≤(1−η)ραdout​​(η∈(0,1)).

Factor out α\alphaα inside the parentheses and cancel α>0\alpha>0α>0 on both sides:[edit | edit source]

1−α2μC ϕ(C) vol(C)dout din (1dv+ρvol(C))  ≤  (1−η)ρdout.\frac{1-\alpha}{2\mu_C}\, \sqrt{\frac{\phi(C)\,\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\, \left(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\right) \;\le\; (1-\eta)\rho\sqrt{d_{\mathrm{out}}}.2μC​1−α​dout​din​ϕ(C)vol(C)​​(dv​​1​+ρvol(C)​)≤(1−η)ρdout​​. Rearrange to isolate the square root:

ϕ(C) vol(C)dout din  ≤  2μC1−α (1−η)ρdout1dv+ρvol(C).\sqrt{\frac{\phi(C)\,\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}} \;\le\; \frac{2\mu_C}{1-\alpha}\, \frac{(1-\eta)\rho\sqrt{d_{\mathrm{out}}}}{\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}}.dout​din​ϕ(C)vol(C)​​≤1−α2μC​​dv​​1​+ρvol(C)​(1−η)ρdout​​​. Square both sides and solve for ϕ(C)\phi(C)ϕ(C):

ϕ(C)  ≤  ϕmax⁡(C):=4μC2(1−η)2ρ2 dout2 din(1−α)2 vol(C) (1dv+ρvol(C))2.\boxed{ \phi(C)\;\le\;\phi_{\max}(C)

=

\frac{4\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}} {(1-\alpha)^2\,\mathrm{vol}(C)\,\Bigl(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2}. }ϕ(C)≤ϕmax​(C):=(1−α)2vol(C)(dv​​1​+ρvol(C)​)24μC2​(1−η)2ρ2dout2​din​​.​ That is the exact “how small should conductance be?” threshold implied by your condition.

The denominator contains (1dv+ρvol(C))2\Bigl(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2(dv​​1​+ρvol(C)​)2. Depending on which term dominates, you get simpler scaling:[edit | edit source]

If ρvol(C)≪1/dv\rho\sqrt{\mathrm{vol}(C)} \ll 1/\sqrt{d_v}ρvol(C)​≪1/dv​​ (equivalently vol(C)≪1/(ρ2dv)\mathrm{vol}(C)\ll 1/(\rho^2 d_v)vol(C)≪1/(ρ2dv​)), then[edit | edit source]

(1dv+ρvol(C))2≈1dv,\Bigl(\tfrac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2 \approx \frac{1}{d_v},(dv​​1​+ρvol(C)​)2≈dv​1​, so

ϕmax⁡(C)  ≈  4μC2(1−η)2ρ2 dout2 din dv(1−α)2 vol(C).\phi_{\max}(C)\;\approx\; \frac{4\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}\,d_v}{(1-\alpha)^2\,\mathrm{vol}(C)}.ϕmax​(C)≈(1−α)2vol(C)4μC2​(1−η)2ρ2dout2​din​dv​​.

If ρvol(C)≫1/dv\rho\sqrt{\mathrm{vol}(C)} \gg 1/\sqrt{d_v}ρvol(C)​≫1/dv​​, then[edit | edit source]

(1dv+ρvol(C))2≈ρ2vol(C),\Bigl(\tfrac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2 \approx \rho^2\mathrm{vol}(C),(dv​​1​+ρvol(C)​)2≈ρ2vol(C), so

ϕmax⁡(C)  ≈  4μC2(1−η)2 dout2 din(1−α)2 vol(C)2.\phi_{\max}(C)\;\approx\; \frac{4\mu_C^2(1-\eta)^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}}{(1-\alpha)^2\,\mathrm{vol}(C)^2}.ϕmax​(C)≈(1−α)2vol(C)24μC2​(1−η)2dout2​din​​.

In the corollary regime you were already using (typical local clustering assumptions):[edit | edit source]

  • vol(C)≲V0/ρ\mathrm{vol}(C)\lesssim V_0/\rhovol(C)≲V0​/ρ,
  • din,dv≳d0d_{\mathrm{in}},d_v\gtrsim d_0din​,dv​≳d0​ (bounded below),
  • μC≳μ0\mu_C\gtrsim \mu_0μC​≳μ0​ (internally well-conditioned),
  • dout≳c0/(α2ρ)d_{\mathrm{out}}\gtrsim c_0/(\alpha^2\rho)dout​≳c0​/(α2ρ) (outside degrees large enough so λi=ραdi\lambda_i=\rho\alpha\sqrt{d_i}λi​=ραdi​​ is Θ(ρ)\Theta(\sqrt{\rho})Θ(ρ​)),

then ϕmax⁡(C)\phi_{\max}(C)ϕmax​(C) simplifies to a constant multiple of ρ\rhoρ:

ϕmax⁡(C)  ≳  μ02(1−η)2 c02 d02α4(1−α)2 V0⏟constant depending on parameters⋅ρ,\phi_{\max}(C)\;\gtrsim\; \underbrace{ \frac{\mu_0^2(1-\eta)^2\,c_0^2\,d_0^2}{\alpha^4(1-\alpha)^2\,V_0} }_{\text{constant depending on parameters}} \cdot \rho,ϕmax​(C)≳constant depending on parametersα4(1−α)2V0​μ02​(1−η)2c02​d02​​​​⋅ρ, (up to the same mild “ρ\rhoρ small enough so the seed term dominates” caveat you used earlier).

So in that common regime, a clean sufficient statement is:

  • Conductance needs to be on the order of ρ\rhoρ (i.e. ϕ(C)≤c ρ\phi(C)\le c\,\rhoϕ(C)≤cρ for a small-enough constant ccc depending on α,μC,dout,din,vol(C),dv,η\alpha,\mu_C,d_{\mathrm{out}},d_{\mathrm{in}},\mathrm{vol}(C),d_v,\etaα,μC​,dout​,din​,vol(C),dv​,η).

If you want, I can also rewrite ϕmax⁡(C)\phi_{\max}(C)ϕmax​(C) in terms of the cut size cut(C,Cˉ)\mathrm{cut}(C,\bar C)cut(C,Cˉ) instead of ϕ(C)\phi(C)ϕ(C), which sometimes reads more directly as a “boundary must be at most ___ edges” condition.

User: Can you convert all this analysis and regimes into a single latex snippet?[edit | edit source]

Can you convert all this analysis and regimes into a single latex snippet?

Assistant: Here is a single LaTeX snippet drop-in that packages the full “solve for ϕC\phiCϕC” calculation, plus the two regimes a…[edit | edit source]

Here is a single LaTeX snippet (drop-in) that packages the full “solve for ϕ(C)\phi(C)ϕ(C)” calculation, plus the two regimes and the ϕ(C)=O(ρ)\phi(C)=O(\rho)ϕ(C)=O(ρ) headline under typical local-cluster scalings. It’s written to fit directly after your definition of Δ(C)\Delta(C)Δ(C) (or anywhere in that section). open-problem-fountoulakis22a

% ------------------------------------------------------------
% Solving the Delta(C)-condition for conductance, with regimes.
% ------------------------------------------------------------
\subsubsection*{How small must $\phi(C)$ be? Solving the $\Delta(C)$ condition}

Recall the sufficient slack condition used in the outside-margin bound:
\begin{equation}\label{eq:Delta_slack_condition}
\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}},
\qquad \eta\in(0,1),
\end{equation}
where
\[
\Delta(C)
=
\frac{1-\alpha}{2\mu_C}\,
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\,
\left(\frac{\alpha}{\sqrt{d_v}}+\rho\alpha\sqrt{\mathrm{vol}(C)}\right).
\]

\begin{lemma}[Conductance threshold implied by \eqref{eq:Delta_slack_condition}]
\label{lem:phi_threshold_exact}
The condition \eqref{eq:Delta_slack_condition} is equivalent to
\begin{equation}\label{eq:phi_max_exact}
\phi(C)\;\le\;\phi_{\max}(C)
:=\frac{4\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}}
{(1-\alpha)^2\,\mathrm{vol}(C)\,\Bigl(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\Bigr)^2}.
\end{equation}
\end{lemma}

\begin{proof}
Start from \eqref{eq:Delta_slack_condition} and substitute the definition of $\Delta(C)$.
Cancel the positive factor $\alpha$ on both sides to get
\[
\frac{1-\alpha}{2\mu_C}\,
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}\,
\left(\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}\right)
\le
(1-\eta)\rho\sqrt{d_{\mathrm{out}}}.
\]
Rearrange to isolate the square root:
\[
\sqrt{\frac{\phi(C)\mathrm{vol}(C)}{d_{\mathrm{out}}\,d_{\mathrm{in}}}}
\le
\frac{2\mu_C}{1-\alpha}\cdot
\frac{(1-\eta)\rho\sqrt{d_{\mathrm{out}}}}{\frac{1}{\sqrt{d_v}}+\rho\sqrt{\mathrm{vol}(C)}}.
\]
Squaring both sides and solving for $\phi(C)$ gives \eqref{eq:phi_max_exact}.
\end{proof}

\begin{remark}[Two useful regimes]\label{rem:phi_regimes}
Let $a:=\frac{1}{\sqrt{d_v}}$ and $b:=\rho\sqrt{\mathrm{vol}(C)}$, so the denominator in
\eqref{eq:phi_max_exact} is $(a+b)^2$.
Two common regimes give simpler sufficient (but slightly stronger) conditions:
\begin{enumerate}
\item \textbf{Seed-dominated regime ($b\le a$).}
If $\rho\sqrt{\mathrm{vol}(C)}\le 1/\sqrt{d_v}$, then $(a+b)^2\le (2a)^2=4/d_v$.
Thus a sufficient condition for \eqref{eq:Delta_slack_condition} is
\begin{equation}\label{eq:phi_seed_sufficient}
\phi(C)\;\le\;
\frac{\mu_C^2(1-\eta)^2\rho^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}\,d_v}
{(1-\alpha)^2\,\mathrm{vol}(C)}.
\end{equation}

\item \textbf{Penalty-dominated regime ($a\le b$).}
If $1/\sqrt{d_v}\le \rho\sqrt{\mathrm{vol}(C)}$, then $(a+b)^2\le (2b)^2=4\rho^2\mathrm{vol}(C)$.
Thus a sufficient condition for \eqref{eq:Delta_slack_condition} is
\begin{equation}\label{eq:phi_penalty_sufficient}
\phi(C)\;\le\;
\frac{\mu_C^2(1-\eta)^2\,d_{\mathrm{out}}^2\,d_{\mathrm{in}}}
{(1-\alpha)^2\,\mathrm{vol}(C)^2}.
\end{equation}
\end{enumerate}
\end{remark}

\begin{remark}[Why $\phi(C)=O(\rho)$ in the typical local-cluster scaling]\label{rem:phi_O_rho}
A common locality regime assumes
\[
\mathrm{vol}(C)\le \frac{V_0}{\rho},
\qquad
d_{\mathrm{in}}\ge d_0,
\qquad
d_v\ge d_0,
\qquad
\mu_C\ge \mu_0,
\qquad
d_{\mathrm{out}}\ge \frac{c_0}{\alpha^2\rho},
\]
for constants $V_0,d_0,\mu_0,c_0>0$.
For small $\rho$, one typically has $\rho\sqrt{\mathrm{vol}(C)}\le \rho\sqrt{V_0/\rho}=\sqrt{V_0\rho}\le 1/\sqrt{d_0}$,
so the seed-dominated regime in Remark~\ref{rem:phi_regimes} applies.
Then \eqref{eq:phi_seed_sufficient} yields the explicit scaling
\[
\phi(C)\;\le\;
\frac{\mu_0^2(1-\eta)^2\,c_0^2\,d_0^2}{\alpha^4(1-\alpha)^2\,V_0}\;\rho,
\]
which is of the form $\phi(C)\le c\,\rho$ with a constant $c$ depending only on
$(\mu_0,d_0,c_0,V_0,\alpha,\eta)$.
\end{remark}
% ------------------------------------------------------------

If you want, I can also add a one-line corollary right after this snippet that explicitly states: “If ϕ(C)≤ϕmax⁡(C)\phi(C)\le\phi_{\max}(C)ϕ(C)≤ϕmax​(C) then Δ(C)≤(1−η)ραdout\Delta(C)\le(1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}Δ(C)≤(1−η)ραdout​​ and hence γCˉ≥ηραdout\gamma_{\bar C}\ge \eta\rho\alpha\sqrt{d_{\mathrm{out}}}γCˉ​≥ηραdout​​” (linking directly back to your theorem).

User: sure, do that[edit | edit source]

sure, do that

Assistant: Add the following corollary + proof right after Lemmaphithresholdexact or anywhere after you’ve defined ϕmax⁡C\phi{\max…[edit | edit source]

Add the following corollary + proof right after Lemma~phi_threshold_exact (or anywhere after you’ve defined ϕmax⁡(C)\phi_{\max}(C)ϕmax​(C)); it explicitly closes the loop back to Theorem~\ref{thm:gamma_parameterized_by_cluster} in the ℓ1\ell_1ℓ1​-regularized PPR setting. open-problem-fountoulakis22a

\begin{corollary}[Conductance condition implies an outside-margin bound]\label{cor:phi_implies_margin}
Fix $\eta\in(0,1)$ and define $\phi_{\max}(C)$ as in \eqref{eq:phi_max_exact}.
If
\[
\phi(C)\le \phi_{\max}(C),
\]
then $\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$.
Consequently, the hypotheses of Theorem~\ref{thm:gamma_parameterized_by_cluster} hold, so:
\begin{enumerate}
\item[(i)] the global minimizer satisfies $x^\star_{\bar C}=0$ (equivalently, $\supp(x^\star)\subseteq C$), and
\item[(ii)] the outside margin obeys
\[
\gamma_{\bar C}:=\min_{i\in\bar C}\Bigl(\lambda_i-|\nabla f(x^\star)_i|\Bigr)
\ge \eta\,\rho\alpha\sqrt{d_{\mathrm{out}}}.
\]
\end{enumerate}
\end{corollary}

\begin{proof}
By Lemma~\ref{lem:phi_threshold_exact}, the inequality $\phi(C)\le \phi_{\max}(C)$ is equivalent to
$\Delta(C)\le (1-\eta)\rho\alpha\sqrt{d_{\mathrm{out}}}$.
Since $\eta\in(0,1)$, this also implies $\Delta(C)<\rho\alpha\sqrt{d_{\mathrm{out}}}$.
Apply Theorem~\ref{thm:gamma_parameterized_by_cluster} to conclude $x^\star_{\bar C}=0$ and the stated bound on
$\gamma_{\bar C}$.
\end{proof}

References[edit | edit source]

  1. "Proving with accelerated algorithms". ChatGPT. Retrieved 2025-12-19.
  2. "Optimization Online" (PDF). Optimization Online. Retrieved 2025-12-19.
  3. "Proceedings of Machine Learning Research" (PDF). Proceedings of Machine Learning Research. Retrieved 2025-12-19.
  4. "Proceedings of Machine Learning Research" (PDF). Proceedings of Machine Learning Research. Retrieved 2025-12-19.
  5. "Optimization Online" (PDF). Optimization Online. Retrieved 2025-12-19.
  6. "Proceedings of Machine Learning Research" (PDF). Proceedings of Machine Learning Research. Retrieved 2025-12-19.
  7. "arXiv". arxiv.org. Retrieved 2025-12-19.
  8. "Proceedings of Machine Learning Research" (PDF). Proceedings of Machine Learning Research. Retrieved 2025-12-19.
  9. "arXiv". arxiv.org. Retrieved 2025-12-19.