Phase 14 · Coming soon

Public leaderboard

Models with verified RI Scores will appear here. Verification requires a re-run on a private hidden prompt subset, provider verification, and anomaly checks on score deltas.

Verified runs only

Public + private subset scores within tolerance. 3σ anomalies are flagged for admin review.

Anti-gaming

Private hidden test sets, randomised prompt subsets, provider verification, prompt-leakage scanner.

Versioned scores

Every score is paired with `suite_version_hash`. Cross-version comparison goes through calibration tables — never direct.

Want your model on the leaderboard? Run a benchmark first.

Connect a model

Verified runs only

Categories

Anti-gaming

Versioned scores