Editing Openai/693e3ce6-229c-8008-97dc-ab720cb1f95a (section)

==== If you relax your goal slightly, there is a more “standard” line of argument you could potentially formalize: ====

===== Assume a margin at the optimum that separates active from inactive coordinates (a strict complementarity / nondegeneracy condition). In ℓ1 language this is the usual requirement that, for inactive coordinates, the optimality condition is strict (not tight). =====

Then the standard identification results imply that after finitely many iterations the algorithm identifies the correct manifold/support, after which the method behaves like an accelerated method on a fixed low-dimensional subspace (size ∣S\''∣|S^\''|∣S\*∣, which is ≈ O(1/ρ)O(1/\rho)O(1/ρ) in your application regime). The catch is:
* the identification time bound typically depends on the margin parameter (often called something like δmin⁡\delta_{\min}δmin), and in degenerate instances δmin⁡\delta_{\min}δmin can be arbitrarily small or zero. Optimization Online<ref>{{cite web|title=Optimization Online|url=https://optimization-online.org/wp-content/uploads/2019/03/7109.pdf|publisher=Optimization Online|access-date=2025-12-19}}</ref>
* so you don’t get a clean worst-case O~((αρ)−1)\tilde O((\sqrt{\alpha}\rho)^{-1})O~((αρ)−1) bound purely in terms of α,ρ\alpha,\rhoα,ρ.

In other words, you can likely get something like

time  ≲  1ρ polylog ⁣(1δmin⁡)⏟identification phase  +  1ραlog⁡ ⁣(1ε)⏟accelerated phase on S\*,\text{time} \;\lesssim\; \underbrace{\frac{1}{\rho}\,\mathrm{polylog}\!\Big(\frac{1}{\delta_{\min}}\Big)}_{\text{identification phase}}
\;+\;
\underbrace{\frac{1}{\rho\sqrt{\alpha}}\log\!\Big(\frac{1}{\varepsilon}\Big)}_{\text{accelerated phase on }S^\''},time≲identification phaseρ1polylog(δmin1)+accelerated phase on S\''ρα1log(ε1),
but the δmin⁡\delta_{\min}δmin-dependence is the part that prevents a clean worst-case theorem.