Editing Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226 (section)

==== 1. Background ====

===== 1.1 Parameter-efficient finetuning and data-to-LoRA generation =====

Large language models (LLMs) are commonly adapted via LoRA: low-rank adapters ΔW=BA\Delta W = BAΔW=BA added to a frozen base model, trained on task data DDD. 2506.16406v1

Recent work like Drag-and-Drop LLMs (DnD) treats LoRA updates themselves as generative targets: a hyper-network takes a batch of unlabeled prompts from a dataset and directly outputs the LoRA weights for that dataset. This collapses the classical data→gradients→weights loop into a single forward pass and removes per-task finetuning.

DnD trains the hyper-network with MSE on tokenized LoRA weights between generated and reference adapters. 2506.16406v1

Limitations of current data-to-LoRA training:
* The target space (LoRA weights) is huge, so pure MSE is noisy and high-variance.
* Weight space is multi-modal: many different LoRAs can yield similar task behavior, but MSE to a single reference penalizes all but one.
* The supervision is purely parametric, not directly aligned with model behavior.

This makes training the generator harder, and probably limits generalization to unseen tasks.

===== 1.2 Delta Activations as behavioral model embeddings =====

Delta Activations propose a compact, behavior-based representation for finetuned models. For a base model fbasef_\text{base}fbase and a finetuned model fff, they define:
* For each probe input xxx from a small generic probe set DprobeD_\text{probe}Dprobe, compute the last-layer hidden state for the final token: - hf(x)h_f(x)hf(x) for the finetuned model, - hbase(x)h_\text{base}(x)hbase(x) for the base model.
* The delta activation for that input is: Δf(x)=hf(x)−hbase(x)\Delta_f(x) = h_f(x) - h_\text{base}(x)Δf(x)=hf(x)−hbase(x)
* Aggregate over the probe set: vf=1∣Dprobe∣∑x∈DprobeΔf(x)v_f = \frac{1}{|D_\text{probe}|} \sum_{x\in D_\text{probe}} \Delta_f(x)vf=∣Dprobe∣1x∈Dprobe∑Δf(x) giving a single vector vf∈Rdv_f \in \mathbb{R}^dvf∈Rd.

They show that:
* vfv_fvf clusters finetuned models by domain (math, coding, legal, etc.).
* The embedding is approximately additive: combining finetuning datasets corresponds to vector addition in the delta space. 2509.04442v1
* A few-shot finetuned model can serve as a task embedding via its Delta Activations.

So Delta Activations give a low-dimensional, behavior-aligned representation of how a LoRA modifies a model.

===== 1.3 Gap and opportunity =====

DnD learns a mapping:

: prompts from dataset → LoRA weights,
: supervised only in weight space.

Delta Activations gives:

: finetuned model (or LoRA) → behavioral embedding δ.

These two lines almost meet, but not quite:
* DnD knows how to go from data to weights, but doesn’t explicitly use any behavioral summary.
* Delta Activations knows how to go from weights to behavior, but doesn’t feed back into adapter generation.

Idea: use Delta Activations as auxiliary behavioral supervision for the data→LoRA generator, to make training easier and improve generalization.

You won’t replace weight-level supervision with δ (good instinct) – you’ll add δ as an extra, low-dimensional, behavior-aligned target.