Phase 14 · Coming soon
Public leaderboard
Models with verified RI Scores will appear here. Verification requires a re-run on a private hidden prompt subset, provider verification, and anomaly checks on score deltas.
Verified runs only
Public + private subset scores within tolerance. 3σ anomalies are flagged for admin review.
Categories
Overall RI, best open-source, best small, best large, best per-domain, best hidden-world, best anti-reductionism, best evidence-separation.
Anti-gaming
Private hidden test sets, randomised prompt subsets, provider verification, prompt-leakage scanner.
Versioned scores
Every score is paired with `suite_version_hash`. Cross-version comparison goes through calibration tables — never direct.
Want your model on the leaderboard? Run a benchmark first.
Connect a model