It does not use LLMs for discovery — only sparse regression, power-law fitting, and statistical validation.
Results so far:
- Kepler's Third Law (P² = a³ / M) from 3,519 NASA exoplanets — R² = 0.998 - Sun’s ~27-day rotation period from solar wind plasma data — 93% accuracy - Power law T ~ v^3.40 in solar wind (NOAA/NASA spacecraft data) - 5/5 General Relativity predictions from simulated black hole observables — all R² = 1.000 - Chirp mass relationship from 219 LIGO gravitational wave events — R² = 0.998
It also detects when no meaningful law exists — Bitcoin daily prices returned R² = 0.00.
Pipeline:
raw data → feature extraction → candidate law generation → fitting → verification
An LLM (Claude) is only used at the end to interpret results in natural language — it is never involved in the discovery step.
All experiments are fully reproducible.
Code: https://github.com/SaulVanCode/protoscience-nasa-experiments