ProtoScience is a deterministic pipeline that takes raw numerical data and autonomously discovers governing equations.

It does not use LLMs for discovery — only sparse regression, power-law fitting, and statistical validation.

Results so far:

- Kepler's Third Law (P² = a³ / M) from 3,519 NASA exoplanets — R² = 0.998 - Sun’s ~27-day rotation period from solar wind plasma data — 93% accuracy - Power law T ~ v^3.40 in solar wind (NOAA/NASA spacecraft data) - 5/5 General Relativity predictions from simulated black hole observables — all R² = 1.000 - Chirp mass relationship from 219 LIGO gravitational wave events — R² = 0.998

It also detects when no meaningful law exists — Bitcoin daily prices returned R² = 0.00.

Pipeline:

raw data → feature extraction → candidate law generation → fitting → verification

An LLM (Claude) is only used at the end to interpret results in natural language — it is never involved in the discovery step.

All experiments are fully reproducible.

Code: https://github.com/SaulVanCode/protoscience-nasa-experiments