Jump to content

Openai/692c91e4-ba20-800d-8366-e5c7ed15aa7b

From freem
Revision as of 05:23, 10 December 2025 by Lukegao (talk | contribs) (by freem.ai...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


This conversation was exported from ChatGPT[1]

gpt-5-t-mini[edit | edit source]

User: how much data is required for solmonoff induction to yield newtonian equations[edit | edit source]

how much data is required for solmonoff induction to yield newtonian equations

Assistant: Short answer: there’s no single fixed number.[edit | edit source]

Short answer: there’s no single fixed number. The amount of data Solomonoff induction needs is roughly the algorithmic (Kolmogorov) complexity of the true Newtonian model (in bits) plus the bits needed to specify the continuous parameters to the precision you want — scaled by how small you require the predictor’s error to be. And: Solomonoff induction is uncomputable, so these bounds are theoretical.

Why (brief derivation & usable formula)

  • Let K(μ)K(\mu)K(μ) be the prefix Kolmogorov complexity (in bits) of the true data-generating measure μ\muμ (this encodes the form of Newton’s laws as a program).
  • Solomonoff’s universal prior MMM satisfies a convergence / cumulative-loss bound of the form ∑t=1TE[−log⁡M(xt∣x<t)] ≤ (K(μ)+O(1))ln⁡2.\sum_{t=1}^T \mathbb{E}\big[-\log M(x_t\mid x_{<t})\big]\ \le\ (K(\mu)+O(1))\ln 2 .t=1∑T​E[−logM(xt​∣x<t​)] ≤ (K(μ)+O(1))ln2. From this you can read off a rough sample-size estimate: to make the average per-step log-loss ≤ε\le \varepsilon≤ε, T≳K(μ)ln⁡2ε.T \gtrsim \frac{K(\mu)\ln 2}{\varepsilon}.T≳εK(μ)ln2​. Interpreting ε\varepsilonε as the tolerated per-step KL / log-loss gives a concrete (though coarse) relation between desired accuracy and data.

Concrete interpretation for Newtonian laws

  • K(μ)K(\mu)K(μ): the program length for “Newton’s laws + solver/measurement model” is small (maybe a few hundred bits) relative to many real-world models.
  • But you must also encode continuous parameters (masses, gravitational constant, initial conditions) to the required precision. To specify a real-valued parameter to ddd decimal digits needs about d⋅log⁡2(10)≈3.322dd\cdot\log_2(10)\approx 3.322dd⋅log2​(10)≈3.322d bits. If you need ppp parameters to ddd decimal digits, add ≈3.322pd\approx 3.322pd≈3.322pd bits to the complexity budget.
  • Noise, model mismatch, and unmodelled forces increase effective complexity and thus required data.

Quick numeric example

  • Suppose the compact Newtonian form is  K(μ) ⁣= ⁣1000\,K(\mu)\!=\!1000K(μ)=1000 bits (including measurement/noise model). You want average per-step log-loss ε=10−3\varepsilon=10^{-3}ε=10−3. Then T≳1000⋅ln⁡210−3≈693,148T \gtrsim \frac{1000\cdot\ln 2}{10^{-3}}\approx 693{,}148T≳10−31000⋅ln2​≈693,148 observations (order-of-magnitude). If you also need to pin down 5 parameters to 10 decimal digits each, that adds ≈5×33.2≈166\approx 5\times 33.2\approx 166≈5×33.2≈166 bits to K(μ)K(\mu)K(μ) (minor compared with 1000 in this toy).

Important caveats

  • Solomonoff induction is uncomputable — the bounds are information-theoretic, not an algorithm you can run. Practical learning uses computable approximations (MDL, Bayesian model selection, algorithmic-approximations, Occam-penalty priors).
  • Identifiability matters: some dynamics are only learnable up to transformations unless you observe the right variables and timescales.
  • The bound above is high-level (log-loss). Translating into, say, parameter-estimation error requires additional statistical analysis and depends on noise distribution, experiment design, and model class.

Takeaway

  • The minimal data scales with the bit-size of the description of Newtonian mechanics plus the bits needed for parameter precision, and inversely with the tolerated per-step log-loss. For any concrete experimental precision you can plug numbers into T≈(K(theory)+bits for params)ln⁡2εT \approx \frac{(K(\text{theory})+\text{bits for params})\ln 2}{\varepsilon}T≈ε(K(theory)+bits for params)ln2​ to get an order-of-magnitude sample-size estimate.

User: who has tried to use formalized occam's razor to re-derive physics from data without human bias thus actually addressin…[edit | edit source]

who has tried to use formalized occam's razor to re-derive physics from data without human bias thus actually addressing the underdetermination of a theory by its data as proven in philosophy of science?

Assistant: Short answer: several people and groups have tried—at two levels.[edit | edit source]

Short answer: several people and groups have tried—at two levels.

  1. Foundational, formal attempts (Solomonoff/Hutter/Chaitin/MDL) — aim to formalize Occam’s razor as algorithmic probability or description length and show (theoretically) that simpler computable theories get higher posterior weight. Key references: Ray Solomonoff’s formal theory of inductive inference (1964) and Marcus Hutter’s Universal AI / AIXI development (book/papers) make this explicit. Ray Solomonoff[2]
  2. Practical rediscovery systems — algorithms that actually take raw numeric/experimental data and output concise physical laws (thus operationalizing a formal simplicity bias). Representative, highly-cited projects:
  • Bongard & Lipson (2007) — automated reverse-engineering of nonlinear dynamical systems (genetic programming → symbolic equations). They show low-dimensional laws can be discovered from trajectories. Meclab[3]
  • Schmidt & Lipson (Science 2009) — “Distilling free-form natural laws from experimental data” (symbolic regression with a parsimony/accuracy tradeoff; famously rediscovered Newtonian pendulum law). Science[4]
  • Brunton, Proctor & Kutz (SINDy, 2016) — sparse identification of governing equations: use sparsity (an Occam-like prior) to pick few terms from a large candidate library. Widely used and scalable. PNAS[5]
  • Rudy et al. (PDE-FIND, 2017) — extends sparse discovery to partial differential equations (spatial systems). Science[6]
  • Udrescu & Tegmark (AI-Feynman, 2019/2020) — physics-inspired symbolic regression that rediscovers many textbook laws by combining neural nets with simplification heuristics (very high success rate on Feynman-equation benchmarks). arXiv[7]
  1. Many follow-ups and hybrids — sparse + physics priors, physics-informed neural nets, deep symbolic regression, PySINDy / DeepMoD, and packages that automate the pipeline from raw measurements to compact models (literature surveys document this growth). PMC[8]

Do these solve the philosophical underdetermination problem? No, not completely.

  • Practically: these methods make underdetermination tractable — they often single out compact, predictive models from noisy data and scale better than exhaustively human-searching hypothesis space (Schmidt/Lipson, SINDy, AI-Feynman are good examples). Science[9]
  • Philosophically: underdetermination persists because any algorithm needs representational choices (language of functions, basis terms, noise model, numerical tolerances, search primitives, priors). Those choices reintroduce “human” bias or background assumptions. Formal Occam priors (Solomonoff/MDL) prove attractive objects exist in theory, but they’re uncomputable (Solomonoff) or sensitive to the chosen universal machine / code / modelling language in practice — so you haven’t eliminated underdetermination, you’ve moved it into design choices. For the general philosophical discussion, see treatments of underdetermination and critiques of algorithmic/ Bayesian simplicity. Ray Solomonoff[10]

Useful short takeaway

  • Who’s tried it: Bongard & Lipson; Schmidt & Lipson; Brunton/Proctor/Kutz (SINDy); Rudy et al. (PDE-FIND); Udrescu & Tegmark (AI-Feynman); plus the theoretical lineage (Solomonoff, Hutter, Chaitin, Rissanen/MDL, Grünwald). Ray Solomonoff[11]
  • Status: strong practical progress (equation discovery works on many real systems), but no definitive philosophical resolution of underdetermination — methods reduce the space of plausible rivals by preferring simplicity, but they don’t make that preference theoretically neutral or eliminate all rival explanations.

If you want, I can:

  • give a short annotated bibliography (2–3 lines each) of the most relevant papers above, or
  • compare how each method formalizes Occam’s razor (e.g., symbolic-regression parsimony penalties vs. MDL vs. Solomonoff priors) with pros/cons.

Which would you prefer?

References[edit | edit source]

  1. "Solomonoff induction data requirements". ChatGPT. Retrieved 2025-12-10.
  2. "Ray Solomonoff" (PDF). Ray Solomonoff. Retrieved 2025-12-10.
  3. "Meclab" (PDF). meclab.w3.uvm.edu. Retrieved 2025-12-10.
  4. "Science". science.org. Retrieved 2025-12-10.
  5. "PNAS". pnas.org. Retrieved 2025-12-10.
  6. "Science". science.org. Retrieved 2025-12-10.
  7. "arXiv". arxiv.org. Retrieved 2025-12-10.
  8. "PMC". pmc.ncbi.nlm.nih.gov. Retrieved 2025-12-10.
  9. "Science". science.org. Retrieved 2025-12-10.
  10. "Ray Solomonoff" (PDF). Ray Solomonoff. Retrieved 2025-12-10.
  11. "Ray Solomonoff" (PDF). Ray Solomonoff. Retrieved 2025-12-10.