Editing Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226 (section)

===== Data for each training LoRA ttt: =====
* Prompts batch PtP_tPt (like DnD)
* True LoRA weights ΔWt\Delta W_tΔWt
* True Delta Activation δt\delta_tδt

Model:
# Condition encoder: et=E(Pt)e_t = E(P_t)et=E(Pt) (e.g., text encoder + pooling)
# LoRA generator: ΔW^t=Gθ(et)\hat{\Delta W}_t = G_\theta(e_t)ΔW^t=Gθ(et)
# Delta head: Option 1 (cheaper): a small learned approximator δ^t=Hϕ(ΔW^t)\hat\delta_t = H_\phi(\hat{\Delta W}_t)δ^t=Hϕ(ΔW^t) Option 2 (more faithful): actually run base+LoRA on a fixed probe set to compute δ^t\hat\delta_tδ^t.

Loss:

L=λweights∥ΔW^t−ΔWt∥2⏟classical DnD-style loss+λδ∥δ^t−δt∥2⏟auxiliary behavioral loss\mathcal{L} = 
\lambda_\text{weights} \underbrace{\|\hat{\Delta W}_t - \Delta W_t\|^2}_{\text{classical DnD-style loss}}
+ \lambda_\delta \underbrace{\|\hat\delta_t - \delta_t\|^2}_{\text{auxiliary behavioral loss}}L=λweightsclassical DnD-style loss∥ΔW^t−ΔWt∥2+λδauxiliary behavioral loss∥δ^t−δt∥2
What does the δ-loss do?
* It pushes the generated LoRA to not only be close in parameter space, but also to induce the correct global behavior on probe prompts.
* It gives an extra gradient signal that’s much lower-dim and semantically aligned, which can stabilize training especially when weight loss is noisy / multi-modal.

Weak points:
* If you compute δ^t\hat\delta_tδ^t via true forward passes on probes, this adds heavy cost per step.
* If you approximate HϕH_\phiHϕ (LoRA → δ), you’ve added another network that might be lossy; but maybe that’s okay because δ itself is coarse.