Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== From Sec. 2: ==== * There is a ground-truth generator f:Z→X,x=f(z)f: Z \to X, \quad x = f(z)f:Z→X,x=f(z) where: - Z=RdzZ = \mathbb{R}^{d_z}Z=Rdz is latent space, decomposed into slots: z=(z1,…,zK),zk∈Rmz = (z_1, \dots, z_K),\quad z_k \in \mathbb{R}^mz=(z1,…,zK),zk∈Rm e.g. “animal 1 slot”, “animal 2 slot”, “background slot”. - X⊂RdxX \subset \mathbb{R}^{d_x}X⊂Rdx is image space (the data manifold). * A representation z^=φ(x)\hat z = \varphi(x)z^=φ(x) is “good perception” if it inverts fff up to slot-wise reparam + permutation: ∀z∈ZS:φ(f(z))=hπ(z)\forall z\in Z_S: \quad \varphi(f(z)) = h_\pi(z)∀z∈ZS:φ(f(z))=hπ(z) (Eq. 2.1). hπh_\pihπ just re-labels and reparametrizes slots. ===== They define two regimes: ===== * Generative approach: - Learn a decoder f^:Z→Rdx\hat f : Z \to \mathbb{R}^{d_x}f^:Z→Rdx. - Use its inverse to represent images: φ(x)=f^−1(x)\varphi(x) = \hat f^{-1}(x)φ(x)=f^−1(x) - For this to match truth, f^\hat ff^ must identify fff: f^(hπ(z))=f(z)(Eq. 2.2)\hat f(h_\pi(z)) = f(z) \quad\text{(Eq. 2.2)}f^(hπ(z))=f(z)(Eq. 2.2) * Non-generative approach: - Learn an encoder directly: φ(x)=g^(x)\varphi(x) = \hat g(x)φ(x)=g^(x) - For this to match truth, g^\hat gg^ must approximate the inverse: g^(x)=hπ(g(x)),g:=f−1(Eq. 2.3)\hat g(x) = h_\pi(g(x)),\quad g := f^{-1} \quad\text{(Eq. 2.3)}g^(x)=hπ(g(x)),g:=f−1(Eq. 2.3) The difference is not “does the architecture have an encoder or decoder”, but: * Are you constraining a decoder f^\hat ff^ and then inverting it (generative)? * Or are you only constraining an encoder g^\hat gg^ (non-generative)? ===== - ID region: XID=f(ZID)X_{\text{ID}} = f(Z_{\text{ID}})XID=f(ZID) from some subset ZID⊂ZZ_{\text{ID}}\subset ZZID⊂Z of concept combinations. ===== * OOD region: ZOODZ_{\text{OOD}}ZOOD is all other combinations of slot values; XOOD=f(ZOOD)X_{\text{OOD}} = f(Z_{\text{OOD}})XOOD=f(ZOOD). The goal: if Eq. (2.1) holds on ZIDZ_{\text{ID}}ZID, does it generalize to all combinations in ZOODZ_{\text{OOD}}ZOOD? That’s compositional generalization. ===== - They define a generator class FintF_{\text{int}}Fint where concepts interact in a structured way (polynomial interactions across slots). ===== * The inverse class Gint={f−1∣f∈Fint}G_{\text{int}} = \{ f^{-1} \mid f \in F_{\text{int}}\}Gint={f−1∣f∈Fint} has strong constraints on Jacobian and Hessian (Eq. 3.3). Key asymmetry: * It’s feasible to constrain a decoder to lie in FintF_{\text{int}}Fint (by architecture/regularization). * It’s basically infeasible to constrain an encoder to lie in GintG_{\text{int}}Gint when images live on a low-dim manifold in a high-dim ambient space (dx≫dzd_x \gg d_zdx≫dz). The Hessian/Jacobian structure “disappears”; almost any encoder is compatible with some f∈Fintf \in F_{\text{int}}f∈Fint. So: you can enforce the right inductive bias on f^\hat ff^, but not on g^\hat gg^. This is why they say: to guarantee compositional generalization, you need a generative approach.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)