Collaborative AI research lab · est. 2017

AI research by brilliant people around the world.

Alphabell is a collaborative multinational research lab. We bet that the next AI breakthroughs will come from sharper, more elegant ideas — not from ingesting more of the internet or building another data center. So we coordinate hundreds of independent researchers, data scientists, and hackers who actually think about this work for a living.

See what we're working on How the lab works

614

Active members

across 41 countries

Public releases

papers, datasets, tools · 2025–26

Replications shipped

of published AI research

Active projects

member-led, open by default

The bet

Many minds, not more compute.

The dominant paradigm in AI today is straightforward: more data, more parameters, more GPUs. It has carried the field a long way. It is also hiding a quieter truth — that "we don't yet have a good idea" keeps getting mistaken for "we don't yet have enough compute".

We think the next round of real breakthroughs will come from people with sharper hypotheses, careful theory, and the kind of focused thinking that doesn't get easier just because you spent another billion dollars. The bottleneck is the idea, not the cluster.

So we don't try to build a frontier lab. We coordinate as many great thinkers as we can find — independent researchers, data scientists, and hackers across dozens of countries — instead of ingesting more of the internet or pouring billions into another data center.

The next AI breakthroughs will come from sharper ideas, not bigger clusters. Many minds beat more compute.

The work we like best is the kind that doesn't get cheaper at scale: a clean hypothesis, a careful replication, an interpretability result that finally explains something, a benchmark that exposes a real failure mode. None of these get unblocked by 10× the GPU budget. They get unblocked when you find the right person and give them runway to think.

That is the entire business model. Find the people. Give them runway. Publish what they find.

Research agenda

The kind of problems where ideas matter more than scale

We pick problems where the bottleneck is the hypothesis, not the cluster — open questions where the cost of being wrong is low, the cost of being right is high, and the work benefits from being done in public. The list below is what we are currently funding most of.

Architectures & training

New model architectures, training methods, and scaling experiments at the edge of what small teams can run.

Mechanistic interpretability

Sparse-feature atlases, causal scrubbing, refusal direction studies. Open dictionaries for open models.

Evaluation methods

Held-out benchmarks, agent-trajectory evals, multilingual prompt sets, blind-spot detection for VLMs.

Agent systems

Long-horizon agent training, schema-aware tool use, replay buffers, and harnesses that don't fall apart on day five.

Dataset experiments

New training data, new ways to study existing data, and underserved-language eval sets built with native speakers.

Reproducibility & audits

End-to-end replications of published work, bug bounties on shipping code, and shared reference numbers.

Recent work

Shipped in the open

A sample of what members have released in the last few months — datasets, audits, atlases, tools. Every release is reproducible by design: code, seeds, configs, and the held-out splits all live next to the paper.

Interpretability

Sparse-feature atlas v2 for the open 7B stack

An open atlas of 18,000+ dictionary features across four popular open-weight 7B checkpoints. CC-BY release.

minou + 41 community contributors p-118 →

Reproducibility

Replication audit of nine router-distillation papers

End-to-end replications of nine recent router-distillation papers. Three reproduce cleanly, two partially, one fails with the published code. All notebooks public.

tomek_w p-120 →

Interpretability

Causal scrubbing for feature lifetimes

Tracking the lifetime of dictionary features across pretraining checkpoints. Code, mid-training snapshots, and intermediate analyses released monthly.

henrik.l p-123 →

Multilingual evaluation

Cantonese tool-use evaluation set v1.0

18,000 labelled adversarial tool-use prompts in Cantonese — the first public dataset of its kind. Dual-released to Hugging Face and OpenReview.

wai.lin p-119 →

Agent systems

Long-trajectory replay buffers for open agents

Training a 7B agent on a 14-day partial-observability trajectory replay buffer. Eval harness and seed trajectories already public; checkpoints due Q3.

sofia_k + dineth.k p-117 →

Multilingual evaluation

African-language code-switching prompt set

Yoruba, Igbo, Hausa, Swahili, and Amharic code-switching prompts for agent tool-use. Built from real customer-service transcripts with verified native-speaker review.

adaeze.o + amal.f p-121 →

All active projects →

Open science as default

Reproducible by design

We take reproducibility as a design constraint, not a virtue badge to be added at the end. Every project releases enough material for an independent team to verify the result — and we fund the people who try.

Open by default

Code, weights, datasets, and write-ups all release under permissive licenses. Closed-source output is the exception, and we want a reason on the record.

Seeds & configs shipped

No "and we did some hyperparameter tuning". Every release ships the seed, the config, the data split, and the version of the eval harness used to score it.

Replications get paid

We pay members to replicate published work — ours and others'. Negative results are published the same way positive ones are. Outcomes go in the public log either way.

Four overlapping ways in

How researchers participate

Membership has no formal threshold. A high-school student writing a clean implementation, a PhD on sabbatical, and a senior engineer chipping in evenings all participate on equal footing — judged by the work.

Mini-grants

Time and runway for one idea.

Short, low-paperwork support for a focused project — a benchmark, a tool, a replication, a paper. Single-page application, decisions in weeks rather than months.

Rolling cohortHow they work →

Fellowships

Funded time to commit deeper.

Three to twelve months of supported research time, with mentorship and compute. For people who want to dig in on an alphabell project or pursue their own agenda.

3–12 monthsRead more →

Competitions & hackathons

Head-to-head, in the open.

Recurring sprints with public leaderboards and held-out evals. Strong runs become a credential — and the submissions themselves become reusable artefacts that other researchers can build on.

Rolling · auto-gradedActive boards →

Compute & infra

The biggest barrier, lifted.

GPU clusters, evaluation harnesses, datasets, and engineering support — available to members with active projects. The single largest practical barrier facing independent AI researchers.

8× H100 · shared poolHow it works →

Working on something worth doing in the open?

If it's smaller than a paper but bigger than a tweet, we probably want to host it. Tell us what you're thinking — the form is three short questions and we read everything.

Pitch a project