Home / Competitions / Mech-Interp Atlas Sprint

Mech-Interp Atlas Sprint

Submit your best interpretability atlas for a held-out 7B model. Judged on reproduction quality plus downstream usability — does someone else's tool, run on your atlas, get the right answer?

Prize
$12,000 · top-3 60/25/15
Status
Closing soon
Deadline
15 Jul 2026
Entries
73 across 21 countries

The task

You're given a held-out 7B base model and a 50,000-token text corpus. Submit an interpretability atlas in our open format (described in the starter pack). The atlas must capture at least 500 features with per-feature activation traces, learned dictionaries, and (optional) human-readable labels.

What we score

  • Reproduction — can the grader recompute your dictionaries from the supplied seed + config?
  • Downstream usability — an independent tool runs an editing task using your atlas; we score pass-rate
  • Coverage — how much of the residual-stream variance does your atlas explain on a held-out probe set?

Why it exists

Mech-interp is hard to compare across labs because everyone uses a different format. We picked a single open schema (atlas-spec) and asked: produce the best one, and we'll judge it by what other people can do with it.

Closing 15 Jul Closing soon

$12,000 prize pool

Top-3 split: $7,200 / $3,000 / $1,800. Held-out grader, open submission code.

$12,000· 15 Jul 2026