Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693e3ce6-229c-8008-97dc-ab720cb1f95a
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: There are two very different issues mixed into “can we assume γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ?”: === There are two very different issues mixed into “can we assume γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ?”: # A hard upper bound coming from the penalty weights in (3), and # A nondegeneracy / “don’t sit at a breakpoint” condition (strict complementarity). Let me separate them. ==== For your objective (3), the ℓ1\ell_1ℓ1 weights are ==== λi = ραdi.('')\lambda_i \;=\; \rho\alpha\sqrt{d_i}. \tag{''}λi=ραdi.(*) and for every inactive coordinate i∈I⋆i\in I^\stari∈I⋆, ∣∇f(x⋆)i∣≤λi,γ=mini∈I⋆(λi−∣∇f(x⋆)i∣).(''')|\nabla f(x^\star)_i|\le \lambda_i, \qquad \gamma = \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr). \tag{'''}∣∇f(x⋆)i∣≤λi,γ=i∈I⋆min(λi−∣∇f(x⋆)i∣).(**) So automatically 0≤γ≤mini∈I⋆λi≤ραmini∈I⋆di≤ραdmin,(1)0 \le \gamma \le \min_{i\in I^\star}\lambda_i \le \rho\alpha\min_{i\in I^\star}\sqrt{d_i} \le \rho\alpha\sqrt{d_{\min}}, \tag{1}0≤γ≤i∈I⋆minλi≤ραi∈I⋆mindi≤ραdmin,(1) where dmind_{\min}dmin is the global minimum degree (or just the minimum degree among inactive nodes). This is immediate from the definition. open-problem-fountoulakis22a ===== If your graph has any inactive node with bounded degree (say degree 1 or 2), then ===== γ≤O(ρ).\gamma \le O(\rho).γ≤O(ρ). And when ρ→0\rho\to 0ρ→0, O(ρ)=o(ρ)O(\rho)=o(\sqrt{\rho})O(ρ)=o(ρ). So on typical sparse graphs with leaves/whiskers, no structural assumption “around the seed” can force γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ, because the bound (1) is global and purely degree-based. This is why the star/path families are so effective as counterexamples: they contain low-degree inactive nodes, forcing γ\gammaγ to be at most O(ρ)O(\rho)O(ρ), and (by tuning ρ\rhoρ near breakpoints) even smaller. ==== To even make γ∼ρ\gamma\sim \sqrt{\rho}γ∼ρ possible when ρ\rhoρ is small, you need λi\lambda_iλi itself to be Ω(ρ)\Omega(\sqrt{\rho})Ω(ρ) for the relevant inactive coordinates. Using (∗)(*)(∗), that means: ==== ραdi ≳ ρ⟹di ≳ 1α2ρ.(2)\rho\alpha\sqrt{d_i} \;\gtrsim\; \sqrt{\rho} \quad\Longrightarrow\quad d_i \;\gtrsim\; \frac{1}{\alpha^2\rho}. \tag{2}ραdi≳ρ⟹di≳α2ρ1.(2) So a necessary (and essentially unavoidable) type of assumption is: ===== A clean version is: ===== * either the graph has minimum degree dmin≳1/(α2ρ)d_{\min} \gtrsim 1/(\alpha^2\rho)dmin≳1/(α2ρ), * or (more weakly) every node that ends up inactive has degree at least c/(α2ρ)c/(\alpha^2\rho)c/(α2ρ). This rules out leaves/paths/stars-with-leaves-outside-support, and it matches the algebra: then mini∈I⋆λi ≥ ραc/(α2ρ) = cρ.\min_{i\in I^\star}\lambda_i \;\ge\; \rho\alpha\sqrt{c/(\alpha^2\rho)} \;=\; \sqrt{c\rho}.i∈I⋆minλi≥ραc/(α2ρ)=cρ. Is this “reasonable”? * On dense graphs, yes. * On many sparse real-world networks, no—unless you pre-process (e.g., work on a core where minimum degree is large). ==== Even if λi\lambda_iλi is large, γ\gammaγ can still be tiny if some inactive coordinate is nearly tight: ==== ∣∇f(x⋆)i∣≈λi.|\nabla f(x^\star)_i|\approx \lambda_i.∣∇f(x⋆)i∣≈λi. That’s exactly what happens at (or near) breakpoints of the regularization path. So you need a second assumption: ===== A natural version is a relative slack condition: ===== ∣∇f(x⋆)i∣≤(1−η)λifor all i∈I⋆,|\nabla f(x^\star)_i| \le (1-\eta)\lambda_i \quad\text{for all } i\in I^\star,∣∇f(x⋆)i∣≤(1−η)λifor all i∈I⋆, for some fixed η∈(0,1)\eta\in(0,1)η∈(0,1). This is stronger than just γ>0\gamma>0γ>0; it forces a uniform fraction of slack. Then γ=mini∈I⋆(λi−∣∇f(x⋆)i∣)≥ηmini∈I⋆λi.(3)\gamma = \min_{i\in I^\star}\bigl(\lambda_i-|\nabla f(x^\star)_i|\bigr) \ge \eta \min_{i\in I^\star}\lambda_i. \tag{3}γ=i∈I⋆min(λi−∣∇f(x⋆)i∣)≥ηi∈I⋆minλi.(3) Combine (3) with the degree condition (2) and you get γ ≳ ηρ(up to α-constants).(4)\gamma \;\gtrsim\; \eta \sqrt{\rho} \quad (\text{up to }\alpha\text{-constants}). \tag{4}γ≳ηρ(up to α-constants).(4) Is Assumption B “reasonable”? * Deterministically in worst case: no (adversarial ρ\rhoρ can always sit near a breakpoint; your star/path show this). * Practically / statistically: often yes, because exact tightness is a codimension-1 event. If ρ\rhoρ is chosen from a coarse grid or with small random jitter, hitting a breakpoint “too closely” becomes unlikely. A very concrete “algorithmic” way to justify B is: * choose ρ\rhoρ on a multiplicative grid (e.g., ρ=2−j\rho=2^{-j}ρ=2−j) and/or * add a tiny random perturbation to ρ\rhoρ (or to edge weights), then with high probability you are not extremely close to a breakpoint, hence γ\gammaγ is not extremely small. ==== ### ==== Then γ≳ρ\gamma \gtrsim \sqrt{\rho}γ≳ρ is not a reasonable target in general, because γ≤O(ρ)\gamma\le O(\rho)γ≤O(ρ) already from (1). The best you can hope for is γ=Θ(ρ)\gamma=\Theta(\rho)γ=Θ(ρ) (and even that needs a nondegeneracy buffer). ===== If you assume something like: ===== * all nodes outside the seed’s local region / outside the optimal support have degree ≳1/ρ\gtrsim 1/\rho≳1/ρ, and * ρ\rhoρ is not tuned near a breakpoint (relative slack η\etaη), then γ\gammaγ can be of order ρ\sqrt{\rho}ρ (more precisely Θ(αρ)\Theta(\alpha\sqrt{\rho})Θ(αρ) up to constants), by the simple chain (3)+(2). If you tell me which regime you care about (typical local clustering on sparse graphs, or a dense-core model, or a probabilistic “ρ\rhoρ not adversarial” model), I can write down a crisp assumption set and show the exact inequality chain that yields γ≥c αρ\gamma \ge c\,\alpha\sqrt{\rho}γ≥cαρ (or explain why it’s impossible in that regime).
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)