Science · Synthetic Users

Synthetic Users system architecture (the brain version).

Everyone else is wiring models together to chase superintelligence. We wire them together to rebuild something rarer — a human one. And like a real nervous system, it doesn't start at the cortex. It starts at the core.

Science · A walk through the updated architecture, from the inside out

This is the question we get more than any other: when you run a Synthetic Users panel, where does the realism actually come from? What is underneath it? We answered the data version of that question elsewhere. This post answers the architecture version — the shape of the system that turns a research goal into a realistic interview.

The short answer is that we do not send your task to a single model and read back whatever it says. Going straight to "a GPT" — Claude, Gemini, GPT, any of them — gives you increasingly hyper-rational answers that do not read like real, organic customers. People are smart, but they take shortcuts, contradict themselves, and are pulled around by subconscious drivers they could not articulate if you asked. A lone frontier model, asked to roleplay a person, smooths all of that away and hands you the bland modal voice it defaults to.

So the architecture exists to put the texture back. And the order in which it does that matters more than anything else in this post. We do not begin with the models and bolt a personality on afterwards. We begin at the core — the dispositional prior that decides how a respondent feels before a single model is asked to speak for them. The layers above answer the next two questions in turn — what they know, then how they reason — and every one of them is conditioned by the core beneath it.

The longer the architecture matured, the more that ordering started to look like a nervous system. Not by design — we did not set out to build a brain. But a brain doesn't run top-down from the cortex either. The deep, old structures come first; they set the priors that the cortex then elaborates. Our system runs the same way, and the analogy turned out to be useful enough to organise this whole post around.

For the generalist Don't ask one AI to pretend to be a person — it gives you a smart-sounding average. Synthetic Users instead starts by deciding how the respondent feels (a calibrated disposition), grounds them in what they know via your data, and only then lets a shuffled ensemble of models reason and speak for them. Like a brain, it builds from the core outward, not the cortex down.

§ 0The agenda: human intelligence, not superintelligence

It is worth being explicit about why the system is built this way, because it diverges from where most of the field is pointed. Almost everyone assembling large models into bigger systems right now is aiming at superintelligence — a mind more capable than ours, that reasons further and faster. That is a coherent goal. It is just not our goal.

We are pointed somewhere stranger and, for understanding people, more useful: human intelligence. Not a smarter respondent — a more human one. One that hesitates, satisfices, holds two incompatible preferences at once, and answers a product question through the lens of a mood it would never name. To rebuild that faithfully you cannot start from the ceiling of capability. You have to start from the shape of a real mind — the priors underneath the reasoning — and build outward.

That reframing changes what every component is for. The model ensemble is not there to be cleverer than one model; it is there to recover the variance a single model erases. The personality core is not decoration; it is the prior that decides who is in the room before anyone speaks. And it is the layer we train first, because it is the layer everything else inherits.

Most of the field is rebuilding the cortex and calling it intelligence. We start with everything underneath it — the parts that decide how a person feels before they ever reason their way to a sentence.

§ 1The architecture, as a brain

Here is the system in one screen. The three layers answer three questions in order — how a respondent feels, what they know, then how they reason. Read it from the top, the way the system actually runs: a task comes in and first instantiates the core — the dispositional prior, how this respondent feels. That core conditions the limbic layer, which loads the context and memory the respondent reasons inside. Only then does the cortical ensemble of models — driven by agents — do the reasoning that produces the interview. The brain labels are not loose metaphor; each one names the function that layer performs, and the arrows run from the deep, old layers outward, exactly as they do in a nervous system.

Figure 1. The architecture as a brain that runs inside-out, read top to bottom. The core is one box: an OCEAN personality prior built from acquired psychographic data and validated against fMRI, all inside it. The core conditions the limbic RAG layer, whose context feeds the central router — the switchboard sitting between the Agents box and the Foundation Models box that together form the neocortex. Solid arrows run forward; dashed violet marks the feedback: the Critic's dimensional verbal feedback is distilled into the SU Persona Adapter — our own model inside the ensemble — and the same signal runs back up to retune the core.

§ 2The core, first: how they feel

Everything begins here. Before any model is asked to speak, the system decides the respondent's disposition — how they are wired to feel and react. From early on, Synthetic Users has scaffolded every respondent with a Big Five profile: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism — dimensions that are, at bottom, dispositional and affective. The reason is mundane and well established. Without explicit personality conditioning, generated voices collapse toward the bland modal tone a base model defaults to. The OCEAN scaffold restores the variance that makes one respondent feel like a genuinely different person from the last.

What makes this a core rather than a setting is where the profiles come from, and the fact that everything downstream inherits them. We do not sample a random OCEAN profile. We derive the personality distribution from the large volume of psychographic and behavioural data we acquire — purchasing patterns, content categories, session behaviour and the like — and calibrate the resulting distribution to match the real composition of the population a panel is meant to represent. Geography, industry and segment priors shift the sampling so a panel resembles the actual mix of personalities you would meet in that audience, not a plausible-looking set of one-offs. When the synthetic distribution drifts, we re-weight and remap.

It sits where the brainstem and the old subcortical structures sit: deepest, oldest, and first to fire. It is not the part that reasons articulately — it is the part that sets the priors everything else runs on. A respondent's conscientiousness or neuroticism is decided before the limbic context is loaded and long before any cortical model forms a sentence, and it colours both. That is exactly why we start here rather than finish here: get the core wrong and every layer above it is fluent nonsense about the wrong person.

This is also the layer we are now starting to anchor against the brain itself. A new category of foundation model can predict whole-brain fMRI directly from the same language-model embeddings we use to instantiate respondents — which means the personality scaffold we have treated as a behavioural prior for years is, for the first time, becoming testable as a neural one. We are not running scanners; we build on the public fMRI corpora and the models trained on them. Directionally, the ambition is to push the core from "this respondent answers like a real one would" toward "this respondent's internal state lines up with a real one's." That work is early, and we have written about where it holds and where it breaks elsewhere. But it is why the core is the part of the architecture we care most about getting right: it is the layer closest to being human, and the layer everything else is built on top of.

The cortex is where the field is competing. The core — a personality distribution drawn from real behaviour and, increasingly, checked against real brains — is where the realism is decided, before a word is spoken.

For the generalist The system starts by deciding how the respondent is wired to feel — their disposition. Each one gets a Big Five personality, and we don't pick those at random; we build the distribution from real behavioural data we acquire and tune it to match the actual population. It's the deepest layer, it fires first, and everything above it inherits it. It's also the layer we're starting to validate against real brain imaging.

§ 3The limbic layer: context and memory via RAG

With the respondent instantiated, the next layer gives them something to reason inside. At answer-time we retrieve facts from your interviews, surveys, CRM notes and product docs, and ground responses in them. This is RAG, and it is deliberately not fine-tuning. No retraining is required; when your policies or copy change, the next answer picks it up immediately.

Functionally this is the system's episodic and sensory memory — the layer that supplies the particulars of the situation, the way the limbic system feeds context and salience up into cortical reasoning. It sits above the core for a reason: the same document lands differently on a risk-averse respondent than an open one, so the context is loaded after the personality that will weigh it. Without this layer an interview floats free in generic plausibility. With it, the respondent is answering your onboarding flow, citing your constraints, reacting to your last release.

This is also where customers inject more of their own context — the documents, segments and institutional knowledge that never made it into any model's pretraining. RAG is what tailors a general system to a specific business without touching the weights underneath it.

For the generalist RAG is the system's memory of your world. Once the respondent exists, this layer pulls in your documents and data so they react to your actual product and policies — and it updates instantly when those change, with no retraining.

§ 4The neocortex: how they reason — an ensemble, last, because one model thinks too cleanly

Only now do the models reason and speak. The outer layer of the system is a set of frontier models — GPT, Claude, Gemini, Llama, Mistral, Hermes — and the agents that drive them. We are model-agnostic by design. A lightweight router selects, and sometimes sequences, multiple models for a single session, and can aggregate their outputs. You control the task ("evaluate the onboarding flow"), the audience hints, and the constraints (jurisdiction, tone, risk). We adjust model choice and ordering, temperature, aggregation and guardrails.

The reason this layer comes last, and is plural, maps cleanly onto the brain. A human cortex is not one uniform sheet; it is a patchwork of regions that developed under different pressures and are good at different things, and cognition is the recruitment of the right regions — on top of the priors the deeper structures already set. Foundation models are similar in a way that is easy to miss: they are trained on different data, with different objectives, different reinforcement regimes and different house styles. Each one carries its own affordances — and its own biases and blind spots.

If you route every question to a single model, you inherit that one model's skew on every answer. Shuffling across an ensemble is the system recruiting different faculties for different parts of the interview — clinical precision from one, conversational looseness from another, a contrarian streak from a third — all of them now speaking as the respondent the core already defined. The point is not that more models are smarter. It is that one model is too internally consistent to sound like a room full of people.

This is the largest layer in the system, and it has grown. Alongside the third-party models — GPT, Claude, Gemini, Llama, Mistral, Hermes — sits a member of our own: the SU Persona Adapter, a persona model trained in-house and routed into the ensemble like any other. More on where it comes from in a moment; the point here is that the cortex is no longer just borrowed general-purpose models. Part of it is tissue we grew ourselves, specialised for the one job of sounding like a calibrated person.

A stack of agents does the work across that ensemble, mediated by a router that acts as a switchboard — selecting and sequencing models for each agent, at each turn. The Planner turns your research goal into an interview plan. The Interviewer runs the script and asks natural follow-ups while staying on brief. The Critic checks the conversation and triggers re-asks — more on it below, because it does more than score. The Synthesizer turns many interviews into insights, generalisations and probe-worthy gaps. They coordinate and learn from outcomes rather than leaning on one monolithic prompt — closer to a working mind than a single oracle being interrogated.

For the generalist Only after the respondent exists and is grounded do the models speak. Different AI models are like different regions of a cortex — each trained differently, each good at different things. We shuffle across them so the voice isn't one model's quirks, and we've added one model of our own to the mix. A router decides which model handles which moment, and agents plan, interview, critique and synthesize rather than one big prompt doing everything.

§ 5The feedback loop: verbal feedback, distilled into the core

A brain that could not learn from being wrong would be useless. The loop that ties everything together over time is also where the most important upgrade lives. It starts with the Critic — and the first thing to understand is that the Critic does not emit a single score. It emits dimensional verbal feedback: specific, written critiques across believability, knowledge, persona fidelity, goal-adherence, secret-keeping, internal contradictions and coverage. Not "7/10," but "this respondent broke character when asked about pricing, and never surfaced the budget constraint a real buyer in this segment would raise."

That feedback does two things. At runtime, it loops straight back to the Interviewer and triggers re-asks, sharpening the interview in the moment. Over time, it is distilled: the dimension-tagged critiques are captured, used to generate a feedback-conditioned "teacher" rollout (what the response should have been, given the critique), and that pair is jointly optimised into the SU Persona Adapter — the in-house model in the ensemble. The result is that the dimensions get baked into the model rather than enforced turn by turn. At inference the Persona behaves as if the Critic were always present, without a Critic in the loop for the dimensions it has already learned.

There is a clean brain parallel for this, and it is the most important one in the piece. Verbal-feedback distillation is consolidation: the move from effortful, deliberate correction — the kind of conscious, working-memory control the prefrontal cortex does — to automatic, internalised competence, the kind that lives in procedural memory once a skill is learned. The Critic is the system thinking out loud about what it got wrong; the adapter is that correction becoming a habit it no longer has to think about. A person learns to drive the same way: every input is conscious and narrated at first, and silent and automatic later.

And the signal does not stop at the cortex. The same captured feedback — misses, contradictions, weak coverage, parity deltas against organic interviews, calibration drift — runs all the way back down and retunes the personality remapping in the core itself. The feedback closes the loop in the right direction: it reaches the deepest layer and adjusts how the respondents feel, not just how they reason. The core, the memory and the cortex are continuously tuned by the gap between what the system predicted and what real comparisons show. That is why this is a living system rather than a fixed pipeline.

For the generalist The Critic doesn't just give a score — it writes specific notes on what was wrong (believability, staying in character, coverage, and so on). Those notes fix the interview in the moment, and over time they get baked into an in-house model so it stops making the same mistakes without being told. It's the difference between consciously correcting yourself and having a skill become second nature.

§ 6A worked example, from the core out

Take a concrete task: "Clinical interview with an oncologist in Berlin about trial-enrollment UX." Watch it move through the layers, in order.

The core fires first: it samples a personality consistent with the real-world distribution for German hospital specialists — high conscientiousness, risk-aware, time-constrained — drawn from the calibrated behavioural model rather than invented. This is the respondent, before a word is spoken.
The limbic layer loads context: RAG pulls your oncology guidelines, past interview notes and the relevant policy constraints into the situation this respondent will reason inside.
The cortex then reasons and speaks: the router favours models strong on clinical context and formal tone and sequences them; the interviewer probes consent flow, eligibility filters and EMR hand-offs.
The critic flags missing questions on adverse-event reporting and multilingual support, and triggers re-asks.
The feedback loop updates routing and, if it sees systematic gaps for this cohort, nudges the remapping in the core.

Minutes later you have a realistic interview and a defensible report — not because one model is brilliant, but because the system built the respondent from the inside out and each layer did the one thing it is shaped to do.

§ 7Why this architecture, in one screen

Core first. How a respondent feels — their disposition — gets decided before any model reasons, drawn from acquired behavioural data and fit to the real population, the way deep brain structures set priors the cortex inherits.
Grounded in your data. RAG adds your facts at answer-time, on top of that personality, so outputs stay specific and current without retraining.
Ensemble beats a single model. Models carry different strengths and biases; routing and aggregation reduce skew and recover the heterogeneity of a real panel.
Multi-agent beats one prompt. Agents that collaborate and critique lift realism and depth without hand-holding.
Verbal feedback, not just a score. The Critic's dimensional critiques are distilled into an in-house adapter, so good behaviour gets internalised rather than re-enforced on every turn — the system's version of a skill becoming second nature.
Not digital twins. We model behavioural spaces — traits × context × knowledge — not 1:1 replicas of individuals. It scales, and it stays fresh.

§ 8Not digital twins — and why that matters

Digital twins try to clone individuals one by one. That breaks down with real audiences: combinatorial explosion, drift, privacy overhead, and brittleness the moment you step outside observed data. We do the opposite. We factorise a person into traits, context and knowledge; calibrate the trait distribution to the organic world; sample cohorts from that calibrated space; and compose behaviours that generalise and update instantly when any factor changes.

It is the same reason a brain does not store a separate copy of every face it has seen. It learns the dimensions along which faces vary and reconstructs any particular one on demand. Our respondents are reconstructed the same way — from a calibrated space, not a warehouse of clones — which is what lets the system scale to a population without pretending to own a thousand real people. For the deeper version of this argument, see Synthetic Users vs Digital Twins.

For the generalist We don't build a digital clone of each person — that doesn't scale and breaks easily. We learn the dimensions people vary along, tune them to the real world, and reconstruct realistic respondents on demand, the way a brain reconstructs a face rather than storing a photo of every one.

§ 9What this buys you

Speed: minutes to realistic interviews and a solid report.
Coverage: many personas and contexts fast, without enumerating "twins."
Change-tracking: when policies or copy change, RAG picks it up without retraining.
Bias control: ensemble plus critic reduce single-model skew.
Reality checks: we benchmark against organic interviews, and the calibrated core keeps personas aligned to the organic world — increasingly, all the way down to the brain.

The architecture has not stopped looking like a single chatbot by accident. It looks like a brain because, layer by layer and from the core outward, that is the thing it is trying to rebuild — not a smarter mind than yours, but a more human one, faithful enough to your customers that you can ask it what they would do and trust the answer.