Home / Competitions / Mech-Interp Atlas Sprint
Mech-Interp Atlas Sprint
Submit your best interpretability atlas for a held-out 7B model. Judged on reproduction quality plus downstream usability — does someone else's tool, run on your atlas, get the right answer?
- Prize
- $12,000 · top-3 60/25/15
- Status
- Closing soon
- Deadline
- 15 Jul 2026
- Entries
- 73 across 21 countries
The task
You're given a held-out 7B base model and a 50,000-token text corpus. Submit an interpretability atlas in our open format (described in the starter pack). The atlas must capture at least 500 features with per-feature activation traces, learned dictionaries, and (optional) human-readable labels.
What we score
- Reproduction — can the grader recompute your dictionaries from the supplied seed + config?
- Downstream usability — an independent tool runs an editing task using your atlas; we score pass-rate
- Coverage — how much of the residual-stream variance does your atlas explain on a held-out probe set?
Why it exists
Mech-interp is hard to compare across labs because everyone uses a different format. We picked a single open schema (atlas-spec) and asked: produce the best one, and we'll judge it by what other people can do with it.
$12,000 prize pool
Top-3 split: $7,200 / $3,000 / $1,800. Held-out grader, open submission code.