
Science
Using Brain Scans to increase parity
Language models are built to outgrow us. Synthetic Users is built to stay exactly as human as you are — and a new result that reads whole sentences out of a brain scanner is why we keep walking toward the source instead of away from it.
The direction problem
Every frontier lab is pointed at the same horizon. The word changes — superintelligence, AGI, capability — but the vector is identical: build a mind that is more than a human one. Faster, broader, less wrong. That is a defensible goal for a lab. It is the wrong goal for us.
A synthetic user is not supposed to be brilliant. It is supposed to be a person — specific, distractible, a little impatient, confused by your onboarding flow, anchored to a price it saw three years ago. The thing we are trying to copy is not intelligence. It is behaviour, and most human behaviour is not very intelligent at all. It is habitual, emotional, context-bound, and gloriously inconsistent.
This puts us at right angles to the labs we depend on. The smarter a base model gets, the further its default voice drifts from the ordinary person we need it to be. So we have spent the company walking the other way — not toward a cleverer model, but toward the source of the behaviour we are trying to reproduce. In So where does the data come from? we laid that source out as a stack: census microdata, the big social surveys, decades of panel studies, and the world model a frontier LLM has already built from internet-scale text. And we said we were adding a new layer — brain scans. This post is about a paper that made the case for it better than we could have.
Two examples, before this gets abstract
This is easier to see than to argue. So before the neuroscience, two cases any researcher will recognise.
The £2 price rise
A streaming service raises its price, and you want to know who churns.
The smart synthetic answer: “I’d assess whether the library still justifies the cost versus competitors, and cancel if it doesn’t.”
Sensible — and wrong about almost everyone. The real subscriber grumbles, means to cancel, and doesn’t: mid-season on one show, can’t face redoing the watchlist. What you need to predict is the gap between what people say and what they do.
The vendor renewal
An IT director chooses between the incumbent and a cheaper, better-reviewed challenger.
The smart synthetic answer: “I’d score both on cost, security and integration, and pick the higher score.”
The real director renews the incumbent — switching is career risk, the challenger left one box blank on the security questionnaire, and nobody gets fired for renewing. What you’re modelling is the human reason the optimal choice doesn’t get made.
In both, the failure is identical: a model pointed at intelligence answers like a brilliant consultant, when you needed it to answer like a tired person on a sofa or a nervous manager covering himself. Closing that gap means anchoring the simulation to where the behaviour actually comes from. Which is the long way of explaining why we keep walking toward the brain — and why the result below matters.
What Meta just did
On 25 June, a team from Meta AI, the École Normale Supérieure, the Basque Center on Cognition, Brain and Language, and Inria released Brain2Qwerty v2 — a model that reads natural sentences directly out of a non-invasive brain scan.
The setup is almost mundane. Nine healthy volunteers sat under a magnetoencephalography (MEG) scanner — a machine that picks up the faint magnetic fields thrown off by firing neurons — and typed sentences they had just heard. No surgery, no electrodes in the skull. Each was recorded for about ten hours, typing some twenty-two thousand sentences between them. The model’s only input is the raw brain signal; its output is the sentence.
The headline number is a 39% word error rate on average — roughly two words in five come out wrong. That sounds modest until you see the distribution. For the best participant, the model decoded half of all sentences with one word error or less, and more than a quarter of them perfect, word for word. For two decades, non-invasive brain reading was stuck on toy tasks because the signal was thought too noisy to carry language. Brain2Qwerty v2 is the first result to pull fluent, natural sentences out of the non-invasive side of that line.
§ 3
How it reads, and the proof it isn’t bluffing
The architecture is the part UX researchers should care about, because it is the same shape we use to instantiate a respondent: characters, then words, then meaning, stacked.
activity
guess
alignment
LLM writes it
That last block raises the obvious suspicion. Put a language model on the end of anything and it produces fluent English — so is it reading the brain, or just autocompleting over a noisy guess? The authors ran the clean experiment: they cut the brain-derived signal out of the language model’s input and let it work from the rough text alone. Accuracy dropped. The model is genuinely leaning on the neural signal to choose its words — it is a language model that has learned to read.
A third detail is close to home. To tune the pipeline, the team turned loose autonomous coding agents — Cursor running on Claude — to improve the model by rewriting its own code. Inside a tight search space they beat classical optimisation and found tricks the humans kept. Handed an open-ended brief, they fell apart. A force multiplier, not a replacement; the humans stayed in the loop.
§ 4
Why this is a source, not a stunt
One result, however good, is a demo. What makes Brain2Qwerty v2 a source is its shape. First, it scales. Retrained on growing amounts of brain data, accuracy improves log-linearly with recording hours — with no sign of flattening at the ninety-hour ceiling the team could afford. Pour in more brain, get more signal, predictably.
Second — and this maps directly onto how we think — diversity matters as much as volume. Holding the total amount of data fixed but widening the variety of sentences made the model dramatically more accurate. Quantity and variety are separate axes of quality. Anyone who has watched a synthetic panel collapse toward a single bland voice knows why that matters: variety is not a nice-to-have, it is the signal. A source behaves like this — it rewards more data, it rewards more varied data, and it does not saturate. That is the property we are buying when we reach for the brain instead of for a bigger prompt.
§ 5
Where it still breaks
It would be dishonest not to keep the habit. The variation between people is large — the model that is near-perfect for the best subject is shaky for the worst. These were healthy volunteers typing, not patients trying to communicate. The signal read is, to a real degree, motor — the brain driving the hands — cleaned up by a model that already knows English. This is not telepathy, and it is not reading a silent thought off a resting mind. And the hardware is a three-hundred-sensor machine cooled with liquid helium in a shielded room. Nobody is wearing that to a usability test.
Two facts point forward. The model stays robust when you throw away half the sensors — the information is not hiding in a few special channels — and wearable, room-temperature MEG sensors are arriving. The lab proof and the deployable device are converging. But the honest line is the same one we drew before: the gap to surgical implants, which type at under 2% error, is still real, and the bridge gets crossed at the speed of the weaker side.
§ 6
What this means if you run user research
Here is the part to take back to your team, and it is not “Synthetic Users is going to scan your customers.” The promise is quieter and more structural. When you ask a synthetic panel what your users would do, the answer has to come from somewhere. Today it comes from text — surveys, transcripts, a language model’s compressed account of millions of humans it read about. That is a strong foundation and a leaky one: the model’s instinct is to be smarter, calmer, and more coherent than the person you are modelling. The drift toward the bland average is the central engineering problem of this entire field.
You fight that drift at the source. Every layer anchored to measured human signal — rather than a model’s best guess at being human — pulls the simulation back toward the messy, specific person you need. Brain2Qwerty v2 is one more brick in that wall: it shows, with numbers, that brain activity is rich enough to read, scales like a real data source, and rewards variety the way good data should. It tells us the layer we bet on is load-bearing.
We do not need this exact model in your next study. We need the world it points to — one where “what a human would do” stops being purely inferred from text about humans and starts being read, in part, from humans themselves. The labs are building a mind that leaves the human behind. We are using the same tools to build one that stays. The whole difference is which way you point the data. And the data, more and more, runs through the brain.
The fMRI-versus-LLM analogy was a metaphor in 2024. In 2026 it is a measurement. The same shift is now happening to the oldest assumption in our field — that you can only learn what a person will do by asking them. You can also, it turns out, read it off the source.
Cited
- Zhang, Lévy, Rommel, Rapin, Bel, Bonnaire, Nieto, Bourdillon, Pinet, d’Ascoli, Moreau, King. Accurate Decoding of Natural Sentences from Non-Invasive Brain Recordings. Meta AI, ENS-PSL, BCBL, Inria. June 2026. (Brain2Qwerty v2.)
- Lévy, Zhang, Pinet, Rapin, Banville, d’Ascoli, King. Brain-to-text decoding: a non-invasive approach via typing. Nature Neuroscience, 2025. (Brain2Qwerty v1.)
- d’Ascoli, Bel, Rapin, Banville, Benchetrit, Pallier, King. Towards decoding individual words from non-invasive brain recordings. Nature Communications, 2025.
- Défossez, Caucheteux, Rapin, Kabeli, King. Decoding speech perception from non-invasive brain recordings. Nature Machine Intelligence, 2023.
- Jude, Levi-Aharoni, et al. Restoring rapid natural bimanual typing with a neuroprosthesis after paralysis. Nature Neuroscience, 2026. (Invasive benchmark, <2% WER.)
- Boto, Holmes, et al. Moving magnetoencephalography towards real-world applications with a wearable system. Nature, 2018. (Optically-pumped magnetometers.)