Collaborative AI research lab · est. 2017
AI research by brilliant people around the world.
Alphabell is a collaborative multinational research lab. We bet that the next AI breakthroughs will come from sharper, more elegant ideas — not from ingesting more of the internet or building another data center. So we coordinate hundreds of independent researchers, data scientists, and hackers who actually think about this work for a living.
The bet
Many minds, not more compute.
The dominant paradigm in AI today is straightforward: more data, more parameters, more GPUs. It has carried the field a long way. It is also hiding a quieter truth — that "we don't yet have a good idea" keeps getting mistaken for "we don't yet have enough compute".
We think the next round of real breakthroughs will come from people with sharper hypotheses, careful theory, and the kind of focused thinking that doesn't get easier just because you spent another billion dollars. The bottleneck is the idea, not the cluster.
So we don't try to build a frontier lab. We coordinate as many great thinkers as we can find — independent researchers, data scientists, and hackers across dozens of countries — instead of ingesting more of the internet or pouring billions into another data center.
The next AI breakthroughs will come from sharper ideas, not bigger clusters. Many minds beat more compute.
The work we like best is the kind that doesn't get cheaper at scale: a clean hypothesis, a careful replication, an interpretability result that finally explains something, a benchmark that exposes a real failure mode. None of these get unblocked by 10× the GPU budget. They get unblocked when you find the right person and give them runway to think.
That is the entire business model. Find the people. Give them runway. Publish what they find.
Research agenda
The kind of problems where ideas matter more than scale
We pick problems where the bottleneck is the hypothesis, not the cluster — open questions where the cost of being wrong is low, the cost of being right is high, and the work benefits from being done in public. The list below is what we are currently funding most of.
Architectures & training
New model architectures, training methods, and scaling experiments at the edge of what small teams can run.
Mechanistic interpretability
Sparse-feature atlases, causal scrubbing, refusal direction studies. Open dictionaries for open models.
Evaluation methods
Held-out benchmarks, agent-trajectory evals, multilingual prompt sets, blind-spot detection for VLMs.
Agent systems
Long-horizon agent training, schema-aware tool use, replay buffers, and harnesses that don't fall apart on day five.
Dataset experiments
New training data, new ways to study existing data, and underserved-language eval sets built with native speakers.
Reproducibility & audits
End-to-end replications of published work, bug bounties on shipping code, and shared reference numbers.
Recent work
Shipped in the open
A sample of what members have released in the last few months — datasets, audits, atlases, tools. Every release is reproducible by design: code, seeds, configs, and the held-out splits all live next to the paper.
Sparse-feature atlas v2 for the open 7B stack
An open atlas of 18,000+ dictionary features across four popular open-weight 7B checkpoints. CC-BY release.
Replication audit of nine router-distillation papers
End-to-end replications of nine recent router-distillation papers. Three reproduce cleanly, two partially, one fails with the published code. All notebooks public.
Causal scrubbing for feature lifetimes
Tracking the lifetime of dictionary features across pretraining checkpoints. Code, mid-training snapshots, and intermediate analyses released monthly.
Cantonese tool-use evaluation set v1.0
18,000 labelled adversarial tool-use prompts in Cantonese — the first public dataset of its kind. Dual-released to Hugging Face and OpenReview.
Long-trajectory replay buffers for open agents
Training a 7B agent on a 14-day partial-observability trajectory replay buffer. Eval harness and seed trajectories already public; checkpoints due Q3.
African-language code-switching prompt set
Yoruba, Igbo, Hausa, Swahili, and Amharic code-switching prompts for agent tool-use. Built from real customer-service transcripts with verified native-speaker review.
Open science as default
Reproducible by design
We take reproducibility as a design constraint, not a virtue badge to be added at the end. Every project releases enough material for an independent team to verify the result — and we fund the people who try.
Open by default
Code, weights, datasets, and write-ups all release under permissive licenses. Closed-source output is the exception, and we want a reason on the record.
Seeds & configs shipped
No "and we did some hyperparameter tuning". Every release ships the seed, the config, the data split, and the version of the eval harness used to score it.
Replications get paid
We pay members to replicate published work — ours and others'. Negative results are published the same way positive ones are. Outcomes go in the public log either way.
Four overlapping ways in
How researchers participate
Membership has no formal threshold. A high-school student writing a clean implementation, a PhD on sabbatical, and a senior engineer chipping in evenings all participate on equal footing — judged by the work.
Time and runway for one idea.
Short, low-paperwork support for a focused project — a benchmark, a tool, a replication, a paper. Single-page application, decisions in weeks rather than months.
Funded time to commit deeper.
Three to twelve months of supported research time, with mentorship and compute. For people who want to dig in on an alphabell project or pursue their own agenda.
Head-to-head, in the open.
Recurring sprints with public leaderboards and held-out evals. Strong runs become a credential — and the submissions themselves become reusable artefacts that other researchers can build on.
The biggest barrier, lifted.
GPU clusters, evaluation harnesses, datasets, and engineering support — available to members with active projects. The single largest practical barrier facing independent AI researchers.
Working on something worth doing in the open?
If it's smaller than a paper but bigger than a tweet, we probably want to host it. Tell us what you're thinking — the form is three short questions and we read everything.