Download our latest Guide
All science posts

Science

So where does the data come from?

When the shapes inside language models started predicting the shapes inside human brains — and what that means for predicting people.

§ 0So where does the data come from?

It is the most recurring question we get from customers, so it deserves a direct answer. The substrate behind a Synthetic Users panel is layered. There are census microdata from the US, UK and other national statistical agencies. There are general social surveys and their international cousins — the GSS, the British Social Attitudes survey, the European Social Survey, the World Values Survey. There are decades of longitudinal panel studies, public opinion polling, ethnographic transcripts and the open consumer-research literature. And underneath all of it sits the world model that frontier language models have already built from internet-scale text — the corpus of every forum thread, transcript, review and book a model the size of Llama or Claude has ingested. Each layer contributes something the others can't. Census data sets the demographic frequencies. Social surveys carry the attitudes. The LLMs supply the conditional structure that lets one belief predict the next.

What we are now adding to that stack is brain scans.

This is a category change, not a quantitative one. Until 2026 there was no defensible way to ask whether a synthetic respondent's internal state — the activations a language model produces when conditioned on a persona — corresponded to anything a real human's brain actually does when shown the same stimulus. That question is now answerable, because a new category of foundation model can predict whole-brain fMRI directly from the kinds of language-model embeddings we already use to instantiate respondents. We are not running our own scanners; we are ingesting the public fMRI corpora that made these models possible, and the models trained on top of them. The promise is to push synthetic-to-organic parity from a purely behavioural claim — "the simulated respondent answers the survey the way a real one would" — to a representational one: the simulated respondent's internal state, projected through a validated brain-encoding model, matches a real participant's cortical response to the same prompt. Where the second claim holds, the first one is on much firmer ground than it has ever been. The rest of this post is how that became possible, what it now lets us predict, and what it still doesn't.

Three pictures. The first is a cortical surface from an fMRI scanner: a small patch on the ventral temporal cortex glows red when the participant looks at a face. The second is a bar chart of hidden activations from a language model: a single direction in activation space spikes when the model is steered to be sycophantic. The third is a cortical surface again — but this one wasn't measured. It was predicted from a Llama 3.2 embedding by a transformer that has never seen the participant's brain before, and it correctly reproduces the fusiform face area.

The first two pictures are old. They have lived next to each other in talks and tweets for years, joined only by a metaphor: look, brains and LLMs both light up. The third picture is new, and it does something the metaphor never could. It draws a measurable line between the other two. This post is about that line, what it now lets us claim about predicting individual human behaviour, and what it still doesn't.

MEASURED · fMRI Fusiform face area DIRECTION · LLM sycophancy Persona vector activation PREDICTED · BRAIN ENCODER from LM embedding FFA, recovered in silico
Figure 1. The same neural region, three ways. The third panel is not a measurement — it is a prediction from a language-model embedding, recovered by a brain-encoding foundation model without ever seeing the participant. The bridge between the first two is what this post is about.

§ 1What fMRI can already predict about an individual

Before we declare a new bridge, it pays to be honest about how strong the old one is. Functional MRI has been making individual-level predictions for two decades. The Yale connectome-fingerprinting literature (Finn et al., 2015 and successors) shows that resting-state functional connectivity predicts fluid intelligence and sustained attention in held-out participants at correlations of roughly r = 0.3 to 0.5. Haynes and colleagues decode the content of upcoming voluntary decisions from prefrontal–parietal patterns several seconds before the participant reports awareness of choosing. Clinical neuroimaging predicts antidepressant response and the transition from clinical-high-risk states to psychosis at accuracies useful enough to matter, even when they are far from deterministic.

It also pays to be honest about how brittle this literature has been. Marek and colleagues (Nature, 2022) reanalysed brain–behaviour association studies and showed that effect sizes routinely collapse when sample sizes climb past a few hundred — many influential individual-prediction results were powered to find correlations that don't survive at n > 1000. The picture that emerges is real but modest: brain shape predicts behaviour, in a defensible way, with effect sizes that almost never exceed r ≈ 0.5 outside of very constrained perceptual tasks.

For the generalist Brain imaging can predict individual differences in attention, intelligence, decisions, and clinical outcomes — but the effects are real-and-small, not real-and-large. Most strong claims you've read about reading personality out of a scanner do not replicate at the sample sizes the field now uses.

§ 2What the shapes inside a language model can already predict

Now the other side. The persona vectors paper from Chen, Arditi, Sleight, Evans and Lindsey (arXiv:2507.21509, 2025) does something that has no direct analogue in human neuroimaging. Given only a natural-language description of a trait — "evil," "sycophancy," "propensity to hallucinate" — their automated pipeline returns a single direction in the residual stream that causally controls how strongly the model expresses that trait. Adding the vector during decoding makes the model agree with everything you say; subtracting it cleans up sycophancy after a botched finetune.

The result that matters most for a predictive science is the one further down the paper. They project candidate training datasets onto a trait's persona vector before any finetuning happens, and that projection predicts post-finetuning trait expression with correlations of r = 0.76 to 0.97 across traits and base models. Internal geometry forecasting future behaviour, with effect sizes that would be career-defining in human neuroscience.

It works partly because the system is closed. There is no hemodynamic lag, no motion artefact, no privacy committee, no n = 30. Every activation is fully observable to floating-point precision, and you can re-run the same model on the same prompt as many times as you like. Goodfire's neural geometry programme (Geiger et al., May 2026) makes the broader point: behaviours in these models live on curved manifolds — days of the week form a literal circle, years form a curve, rhymes form a slant — and where a model is on its manifold predicts what it will generate next. Linear steering through "voids" in these shapes produces garbled outputs; following the geometry produces predictable, controllable behaviour.

For the generalist Inside a language model, there are specific directions in its activations that correspond to traits like "evil" or "agreeable." We can read those directions before training a model on new data and predict, with surprising accuracy, what the model will do afterwards. Brain prediction works at modest correlations; this works at large ones — because the system is fully observable in a way a brain isn't.

§ 3The bridge: a new category of brain encoder connects the two geometries

The category is best understood at the level of its three blocks: input → embedding → brain map. The input is whatever the participant would see, hear, or read — a sentence, a sound, an image, a clip of film. The embedding is what a multimodal foundation model — covering text, audio, video and images — already represents about that stimulus, the result of years of pretraining on internet-scale data. The brain map is a learned projection from that embedding onto cortical voxels, fitted against hundreds of hours of fMRI from hundreds of participants.

STIMULUS "This man is blind" LM EMBEDDING BRAIN ENCODER PREDICTED CORTEX FFA VWFA Broca PPA
Figure 2. The architecture in three blocks. A stimulus passes through a frozen multimodal foundation model (the text path shown; audio, video and image paths run in parallel). The resulting embedding is mapped to predicted whole-brain fMRI by a transformer with per-subject adapters. The cortical map on the right is recovered, not measured.

What makes this architecturally different from Level 2 is the middle block. A Level-2 brain–behaviour model is end-to-end: scan goes in, behavioural prediction comes out, the dependency between them learned across however many participants the study could afford. There is no explicit, inspectable structure between the two ends. Every nuisance source — motion, hemodynamics, individual variation, the limits of what thirty minutes of recording can capture — lands in the same black box, and that is why the field caps near r ≈ 0.5.

A brain encoder is not end-to-end. Its middle layer is a foundation-model embedding, a representation we already know how to read and intervene on. The geometry of meaning is borrowed from a model that has already generalised across people and topics; the mapper is the small piece that has to be learned on top.

That borrowed middle layer gives the category three properties Level 2 doesn't have. Counterfactual access: you can change the input — paraphrase a prompt, swap a face, edit a clip — and watch the predicted brain response change. You cannot rerun a brain; you can rerun an encoder a million times. Compositional conditioning: you can intervene on the embedding itself — add a persona vector, steer along a trait, ablate a feature — and ask what the predicted brain does. The middle is open in a way the brain is not. Population-decoupled generalisation: Level-2 models overfit to the participants they were trained on, and new subjects need new data; the brain mapper rides on top of an embedding that already generalises, so transporting predictions across people becomes mostly the foundation model's job, not the mapper's.

Empirically the category reaches r ≈ 0.3 in association cortex and r ≈ 0.6 in primary sensory cortex on held-out subjects watching held-out stimuli — better than the linear baselines neuroscience has used for two decades, and improving log-linearly with data. But the numbers are not the point. The point is that the model's input layer is the same family of embedding that contains a persona vector. The architecture that predicts brains and the architecture that defines a synthetic respondent meet in the middle block.

For the generalist Level-2 brain prediction is a black box: brain in, behaviour out, learn the mapping. The new category splits the mapping into three explicit blocks — a stimulus passes through a foundation-model embedding before being projected to predicted brain activity. The middle block is the whole point, because it is a representation we already know how to read and intervene on — and it is the same representation Synthetic Users already uses to instantiate respondents.

§ 4The four-level prediction stack

Stacking these results gives you something a Synthetic Users practitioner should care about: a layered pipeline where each layer is now an established empirical result rather than a conjecture.

LEVEL 1 Model geometry → model behaviour Persona vectors (Chen et al. 2025) r = 0.76 – 0.97 LEVEL 2 Brain geometry → human behaviour CPM, Haynes, clinical prediction r ≈ 0.3 – 0.5 LEVEL 3 LLM representations → brain responses Brain encoders (d'Ascoli et al. 2026) r ≈ 0.3 assoc · 0.6 sensory LEVEL 4 — ONGOING Persona-conditioned LLM → predicted brain → predicted behaviour Synthetic Users validation underway Compose Levels 1 and 3; validate against held-out humans at Level 2.
Figure 3. The four-level prediction stack. Levels 1–3 are published results; Level 4 is the composition that becomes available in 2026.

Level 1 is internal to a language model: read its geometry, predict its own behaviour. Persona vectors give us r in the high 0.80s and low 0.90s. Level 2 is internal to a human: read brain geometry, predict behaviour. The Yale, Berlin and clinical-imaging literatures give us r in the 0.3 to 0.5 range, with the Marek caveat firmly attached. Level 3 is the new one: read LLM geometry, predict brain geometry. The new brain-encoder category gets us there at r ≈ 0.3 in association cortex and ≈ 0.6 in primary sensory regions, with another two- to fourfold improvement available from one hour of per-subject finetuning.

Level 4 is the composition. Take a natural-language description of a person — the same kind of description Synthetic Users already use to instantiate a synthetic respondent. Extract a persona vector from a base LLM. Push that representation through a brain-encoding foundation model to get a predicted whole-brain map. Compare that map to fMRI from real participants who fit the description. Where the maps agree, you have a defensible Level-4 inference: the persona-conditioned LLM is producing internal states that correspond, at the representational level, to a real human's brain response. Where they disagree, you have a precise boundary on where the simulation is hallucinating about people.

For the generalist Three of the four steps from "describe a person in language" to "predict what their brain and behaviour will do" are now empirically supported. The fourth step — composing them into a single pipeline — is the new opportunity, and it is testable rather than speculative.

§ 5From OCEAN to persona vectors: what this looks like inside Synthetic Users

None of this is abstract for us. Inside Synthetic Users, we started injecting personality profiles into respondents early on, as the cleanest way to recover the variance a generative panel needs. Each synthetic respondent is scaffolded with a Big Five (OCEAN) score — Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism — sampled to match the joint distribution of the population the panel is meant to represent. The reason is mundane and well-known: without explicit personality conditioning, generative samples collapse toward the bland modal voice that base models default to. The OCEAN scaffold restores the variance that makes a respondent feel like a different person from the one before it.

The interesting question, in light of the rest of this post, is what that scaffold does mechanistically. Chen et al.'s persona-vector pipeline takes a natural-language trait description as input and returns a single causal direction in the residual stream. Their examples are "evil" and "sycophancy"; nothing in the method prevents you from running it on "high in Openness to experience" or "low in Conscientiousness." Each of the five OCEAN dimensions becomes a vector. The respondent's score along each dimension can then be enforced not only by the conditioning prompt but by direct activation steering along the corresponding axis, at any layer, with arbitrary coefficient. That alone is a meaningful upgrade to the scaffold, because prompt-conditioned personality is brittle — a sufficiently leading question can pull a "prompted extravert" back to the modal voice. A vector-steered respondent stays on-trait by construction.

The deeper consequence is that the Big Five becomes the obvious first target for the Level-4 validation experiment laid out above. The five trait dimensions already have published fMRI correlates — DeYoung and colleagues' work on cortical structure, Adelstein et al. (2011) on resting-state functional connectivity, Riccelli et al. (2017) on grey-matter morphometry. Run an OCEAN-conditioned respondent's representations through a brain-encoding foundation model, generate a predicted whole-brain map for each pole of each trait, and ask whether the prediction agrees with the neuroimaging literature for the same trait. Where it agrees, we have substrate-level validation of a scaffold the panel has used as a behavioural prior for years. Where it disagrees, we know exactly which dimension the simulation is pretending about.

Two honest caveats sit on top. First, trait-level neuroimaging is some of the messiest published work in the field; the Marek 2022 caveat from § 1 bites hardest here, because effect sizes are small and historical sample sizes were tiny. Any encoder-based validity test of OCEAN should be powered against the post-Marek consensus on what counts as a real correlate, not against pre-2015 papers. Second, even if every dimension passes a representational validity test, that only shows the synthetic respondent's internal state lines up with the human's; it does not automatically validate every downstream behavioural prediction the respondent produces. The chain has to be walked one link at a time. But the chain finally exists, and OCEAN — precisely because it has been our scaffold from the start — is where we should walk it first.

For the generalist Synthetic Users already conditions every synthetic respondent with a Big Five personality profile, because without it the model's outputs collapse to a generic average voice. Each of those five traits can now be turned into a specific direction in the model's activations, and that direction can be checked against the published brain-imaging literature for the same trait. The personality scaffold we've used as a behavioural prior is becoming testable as a neural one.

§ 6Where the parallel still breaks

It is important to say what this category of model does not prove. It demonstrates representational alignment, not mechanistic identity. The geometry of a language-model embedding can be linearly mapped to predict cortical responses; this does not entail that the brain computes the way the language model does, any more than the fact that the brain's tonotopic map is linear in log-frequency entails that the cochlea is a Fourier transform.

Three honest caveats survive into 2026. First, tool fragility. Goodfire's critique of sparse autoencoders — that they shatter coherent manifolds into many spurious "features" — applies equally to any method that decomposes a brain into "regions." When the encoder recovers an FFA, is it discovering the brain's structure, or projecting the model's? The fact that it reproduces neuroscientific consensus is reassuring; it is also, in part, exactly what we tuned the loss to do. Second, substrate independence cuts both ways. A persona vector lives in a coordinate system you chose; a brain region is a piece of physical tissue with anatomical identity. Cross-prediction does not collapse this difference, it operationalises it. Third, origin gap. A persona-conditioned LLM was trained on text about humans, not on being human. Anything it predicts about a person is one inference step removed from the person, and that step has unknown loss until validated end-to-end.

There is also a sobering meta-observation. Encoding correlations of r ≈ 0.3 in association cortex are large by neuroscience standards and small by language-model standards. The closed system of an LLM lets persona vectors reach r ≈ 0.9; the open, biological system of a brain still tops out near 0.5. Composing them inherits the lower of the two ceilings, not the higher. The bridge is real, but the traffic on it moves at the speed of the weaker partner.

For the generalist The geometry of language models and the geometry of human brains can be aligned, but they are not the same thing. Sharing a shape doesn't mean sharing a mechanism. And when you chain predictions together, the noisiest link sets the ceiling — and the noisy link, for now, is the brain.

§ 7What to actually build

0.0 0.2 0.4 0.6 0.8 1.0 Pearson r CPM fluid intel. 0.4 Haynes decision decode 0.6 Encoder assoc. cortex 0.3 Encoder primary sens. 0.6 Persona vectors 0.92 Level 4 validating ?
Figure 4. Where prediction strength actually sits in 2026. Biological-system predictions cap near 0.5; closed-system LLM predictions reach 0.9+. A composed Level-4 pipeline will inherit the weaker ceiling — and that is the empirical question worth running. Mixed units: biological prediction is open-system, LLM prediction is closed-system; the chart is honest, not commensurate.

For a Synthetic Users practitioner, the concrete next move is a Level-4 validation experiment. Pick an attitudinal trait with a clear behavioural correlate — political orientation, risk tolerance, sycophantic acquiescence. Extract a persona vector from a base LLM using the Chen et al. pipeline. Pass conditioned outputs from that vector through a brain-encoding foundation model's text path to generate a predicted whole-brain map. Recruit participants segmented on the same trait, run them through the same naturalistic stimuli the encoder was trained on, and compare predicted to measured cortical contrasts at the parcel level. Where agreement is significant at q(FDR) < 0.05, that persona vector is a defensible behavioural proxy. Where it isn't, you have learned exactly which traits the simulation cannot transport.

The point isn't that LLMs are people. The point is that, for the first time, we can quantify the conditions under which their internal geometry behaves enough like ours to substitute for a human in a prediction. The shapes are real on both sides. The bridge between them is now built. What remains is to walk it carefully, with our prior on the brain's noisiness intact, and to publish the points at which it carries weight and the points at which it doesn't.

The fMRI-vs-LLM analogy was a metaphor in 2024 and a foundation model in 2026. The metaphor was always going to be wrong in detail. The foundation model is precise enough to be useful — and precise enough to be falsified.

Cited

  1. d'Ascoli, Rapin, Benchetrit, Brookes, Begany, Raugel, Banville, King. A foundation model of vision, audition, and language for in-silico neuroscience. FAIR Meta + ENS-PSL, March 2026.
  2. Chen, Arditi, Sleight, Evans, Lindsey. Persona Vectors: Monitoring and Controlling Character Traits in Language Models. arXiv:2507.21509, 2025.
  3. Geiger, Lubana, Fel, Merullo, Byun, Lewis, McGrath. The World Inside Neural Networks. Goodfire, May 2026.
  4. Finn, Shen, Scheinost, Rosenberg, Huang, Chun, Papademetris, Constable. Functional connectome fingerprinting. Nature Neuroscience 18, 2015.
  5. Soon, Brass, Heinze, Haynes. Unconscious determinants of free decisions in the human brain. Nature Neuroscience 11, 2008.
  6. Marek, Tervo-Clemmens, Calabro, et al. Reproducible brain-wide association studies require thousands of individuals. Nature 603, 2022.
  7. Schrimpf, Kubilius, Hong, et al. Brain-Score. bioRxiv 407007, 2018. — for the encoding-model lineage the new category extends.
  8. Caucheteux, King. Brains and algorithms partially converge in natural language processing. Communications Biology 5, 2022.