Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== 1. Background ==== ===== 1.1 Parameter-efficient finetuning and data-to-LoRA generation ===== Large language models (LLMs) are commonly adapted via LoRA: low-rank adapters ΔW=BA\Delta W = BAΔW=BA added to a frozen base model, trained on task data DDD. 2506.16406v1 Recent work like Drag-and-Drop LLMs (DnD) treats LoRA updates themselves as generative targets: a hyper-network takes a batch of unlabeled prompts from a dataset and directly outputs the LoRA weights for that dataset. This collapses the classical data→gradients→weights loop into a single forward pass and removes per-task finetuning. DnD trains the hyper-network with MSE on tokenized LoRA weights between generated and reference adapters. 2506.16406v1 Limitations of current data-to-LoRA training: * The target space (LoRA weights) is huge, so pure MSE is noisy and high-variance. * Weight space is multi-modal: many different LoRAs can yield similar task behavior, but MSE to a single reference penalizes all but one. * The supervision is purely parametric, not directly aligned with model behavior. This makes training the generator harder, and probably limits generalization to unseen tasks. ===== 1.2 Delta Activations as behavioral model embeddings ===== Delta Activations propose a compact, behavior-based representation for finetuned models. For a base model fbasef_\text{base}fbase and a finetuned model fff, they define: * For each probe input xxx from a small generic probe set DprobeD_\text{probe}Dprobe, compute the last-layer hidden state for the final token: - hf(x)h_f(x)hf(x) for the finetuned model, - hbase(x)h_\text{base}(x)hbase(x) for the base model. * The delta activation for that input is: Δf(x)=hf(x)−hbase(x)\Delta_f(x) = h_f(x) - h_\text{base}(x)Δf(x)=hf(x)−hbase(x) * Aggregate over the probe set: vf=1∣Dprobe∣∑x∈DprobeΔf(x)v_f = \frac{1}{|D_\text{probe}|} \sum_{x\in D_\text{probe}} \Delta_f(x)vf=∣Dprobe∣1x∈Dprobe∑Δf(x) giving a single vector vf∈Rdv_f \in \mathbb{R}^dvf∈Rd. They show that: * vfv_fvf clusters finetuned models by domain (math, coding, legal, etc.). * The embedding is approximately additive: combining finetuning datasets corresponds to vector addition in the delta space. 2509.04442v1 * A few-shot finetuned model can serve as a task embedding via its Delta Activations. So Delta Activations give a low-dimensional, behavior-aligned representation of how a LoRA modifies a model. ===== 1.3 Gap and opportunity ===== DnD learns a mapping: : prompts from dataset → LoRA weights, : supervised only in weight space. Delta Activations gives: : finetuned model (or LoRA) → behavioral embedding δ. These two lines almost meet, but not quite: * DnD knows how to go from data to weights, but doesn’t explicitly use any behavioral summary. * Delta Activations knows how to go from weights to behavior, but doesn’t feed back into adapter generation. Idea: use Delta Activations as auxiliary behavioral supervision for the data→LoRA generator, to make training easier and improve generalization. You won’t replace weight-level supervision with δ (good instinct) – you’ll add δ as an extra, low-dimensional, behavior-aligned target.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)