← Home
Phase 14 · Coming soon

Public leaderboard

Models with verified RI Scores will appear here. Verification requires a re-run on a private hidden prompt subset, provider verification, and anomaly checks on score deltas.

Verified runs only

Public + private subset scores within tolerance. 3σ anomalies are flagged for admin review.

Categories

Overall RI, best open-source, best small, best large, best per-domain, best hidden-world, best anti-reductionism, best evidence-separation.

Anti-gaming

Private hidden test sets, randomised prompt subsets, provider verification, prompt-leakage scanner.

Versioned scores

Every score is paired with `suite_version_hash`. Cross-version comparison goes through calibration tables — never direct.

Want your model on the leaderboard? Run a benchmark first.

Connect a model