2026-03-28Metaneurosciencebrain-computer interfaceopen source AIAI researchTRIBE v2

Meta just open-sourced AI that maps brain reactions

Meta released TRIBE v2, a free open-source AI predicting brain responses to video, audio, and text — 70x more detailed than v1, trained on 720 subjects.

Meta TRIBE v2 brain encoding model visualization

On March 26, 2026, Meta released TRIBE v2 (TRImodal Brain Encoder version 2) as a fully free, open-source AI model that does something unprecedented: it predicts how your brain responds to anything you see, hear, or read — and does so more accurately than comparing your brain activity to a single scan of yourself.

In practical terms: show TRIBE v2 a YouTube video, play it a piece of music, or feed it a paragraph of text, and it will predict which of 70,000 brain regions will activate and by how much — before a single person watches, hears, or reads it. This was a capability that previously required expensive MRI sessions (using a machine that maps brain activity by detecting blood flow) with individual human subjects. TRIBE v2 virtualizes those experiments entirely.

What TRIBE v2 Actually Does in Plain Terms

Imagine you're a pharmaceutical company testing whether a new drug affects how people process language. Traditionally, you'd recruit dozens of volunteers, put them in an MRI scanner (a medical imaging machine costing $3–7 million), and run expensive studies over months. TRIBE v2 lets you simulate those brain responses computationally in minutes, at essentially zero marginal cost.

Or imagine you're a neuroscientist trying to understand how the brain's visual system differs from its language system. TRIBE v2 can map both simultaneously — across 70,000 brain regions — because it processes video, audio, and text all at once. The model's internal representations spontaneously organized into 5 known functional brain networks during training: primary auditory, language, motion, default mode (the brain at rest), and visual — without being explicitly told to.

The Numbers: From 4 Subjects to 720, From 1,000 to 70,000 Voxels

The original TRIBE v1 covered approximately 1,000 brain voxels (three-dimensional pixels of brain activity, each representing a small cube of neural tissue) and was trained on data from just 4 subjects. TRIBE v2 is a fundamentally different scale of ambition:

📊 TRIBE v1 vs. TRIBE v2 — What Changed

Metric	TRIBE v1	TRIBE v2
Brain voxels mapped	~1,000	70,000 (70× more)
Training subjects	4	720
fMRI training data	Limited	1,000+ hours
Scanner resolution	Standard	7 Tesla MRI
Accuracy vs. individual scan	Below average	2× higher correlation

The accuracy figure is striking: in high-quality scan conditions, TRIBE v2's predictions achieve correlations twice as high as the median brain scan of any single individual. This means the model's predictions of "what your brain does" are statistically more accurate than a one-off measurement of your own brain. When combining all three input modalities (video + audio + text together), it achieves a 50% accuracy improvement in multisensory brain regions compared to single-modality approaches.

How It Works Under the Hood — Without the Jargon

TRIBE v2 uses three specialized AI models under the hood, each expert in one type of content:

🎬 Video-JEPA-2 — Meta's video understanding model (processes visual motion and scenes)
🎵 Wav2Vec-BERT-2.0 — Meta's audio model (processes speech and sound)
📝 Llama 3.2 — Meta's language model (processes text and meaning)

A fourth layer — a transformer (a type of neural network that weighs how different pieces of information relate to each other, the same core technology behind ChatGPT) — fuses all three streams together and predicts brain activation across 70,000 voxels. The model can predict responses for entirely new people it has never seen, and for languages it was never specifically trained on — a capability called zero-shot generalization (performing a task on completely new inputs).

TRIBE v2 architecture diagram showing multimodal brain encoding

What Researchers and Developers Can Do With It Right Now

TRIBE v2 is fully open source — model weights (the trained parameters that make the AI work), code, and an interactive demo are all freely available. This isn't a waitlist or a limited research preview. You can clone it and run experiments today:

git clone https://github.com/facebookresearch/tribev2
cd tribev2
pip install -r requirements.txt
# Model weights available on Hugging Face:
# https://huggingface.co/facebook/tribe-v2

If you work in neuroscience: TRIBE v2 enables "in-silico neuroscience" (virtual experiments inside a computer) — running thousands of brain response experiments without recruiting subjects or booking MRI time. It can dramatically accelerate hypothesis testing.

If you work in pharmaceutical research: Testing whether a drug changes how patients process language or visual stimuli traditionally requires expensive clinical trials. TRIBE v2 could serve as a pre-screening layer — identifying which stimuli show the most interesting brain response patterns before costly human studies begin.

If you build brain-computer interfaces (BCIs — devices that let computers read signals from or send signals to the brain, like cochlear implants or experimental prosthetics): TRIBE v2's detailed map of how the brain responds to multimodal inputs could improve the design of BCI systems that decode speech, visual perception, or intent.

Important limitation: TRIBE v2 is an encoding model only — it predicts what the brain responds to when exposed to stimuli. It cannot decode private thoughts, intentions, or mental imagery. It does not "read minds." This is a tool for understanding the brain's sensory responses, not inner experience.

Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments