
Science
Synthetic Users system architecture
Everyone else is wiring models together to chase superintelligence. We wire them together to rebuild something rarer — a human one. And like a real nervous system, it doesn't start at the cortex. It starts at the core.
Synthetic Users system architecture (the brain version).
Everyone else is wiring models together to chase superintelligence. We wire them together to rebuild something rarer — a human one. And like a real nervous system, it doesn't start at the cortex. It starts at the core.
This is the question we get more than any other: when you run a Synthetic Users panel, where does the realism actually come from? What is underneath it? We answered the data version of that question elsewhere. This post answers the architecture version — the shape of the system that turns a research goal into a realistic interview.
The short answer is that we do not send your task to a single model and read back whatever it says. Going straight to "a GPT" — Claude, Gemini, GPT, any of them — gives you increasingly hyper-rational answers that do not read like real, organic customers. People are smart, but they take shortcuts, contradict themselves, and are pulled around by subconscious drivers they could not articulate if you asked. A lone frontier model, asked to roleplay a person, smooths all of that away and hands you the bland modal voice it defaults to.
So the architecture exists to put the texture back. And the order in which it does that matters more than anything else in this post. We do not begin with the models and bolt a personality on afterwards. We begin at the core — the dispositional prior that decides how a respondent feels before a single model is asked to speak for them. The layers above answer the next two questions in turn — what they know, then how they reason — and every one of them is conditioned by the core beneath it.
The longer the architecture matured, the more that ordering started to look like a nervous system. Not by design — we did not set out to build a brain. But a brain doesn't run top-down from the cortex either. The deep, old structures come first; they set the priors that the cortex then elaborates. Our system runs the same way, and the analogy turned out to be useful enough to organise this whole post around.
§ 0The agenda: human intelligence, not superintelligence
It is worth being explicit about why the system is built this way, because it diverges from where most of the field is pointed. Almost everyone assembling large models into bigger systems right now is aiming at superintelligence — a mind more capable than ours, that reasons further and faster. That is a coherent goal. It is just not our goal.
We are pointed somewhere stranger and, for understanding people, more useful: human intelligence. Not a smarter respondent — a more human one. One that hesitates, satisfices, holds two incompatible preferences at once, and answers a product question through the lens of a mood it would never name. To rebuild that faithfully you cannot start from the ceiling of capability. You have to start from the shape of a real mind — the priors underneath the reasoning — and build outward.
That reframing changes what every component is for. The model ensemble is not there to be cleverer than one model; it is there to recover the variance a single model erases. The personality core is not decoration; it is the prior that decides who is in the room before anyone speaks. And it is the layer we train first, because it is the layer everything else inherits.
§ 1The architecture, as a brain
Here is the system in one screen. The three layers answer three questions in order — how a respondent feels, what they know, then how they reason. Read it from the top, the way the system actually runs: a task comes in and first instantiates the core — the dispositional prior, how this respondent feels. That core conditions the limbic layer, which loads the context and memory the respondent reasons inside. Only then does the cortical ensemble of models — driven by agents — do the reasoning that produces the interview. The brain labels are not loose metaphor; each one names the function that layer performs, and the arrows run from the deep, old layers outward, exactly as they do in a nervous system.
§ 2The core, first: how they feel
Everything begins here. Before any model is asked to speak, the system decides the respondent's disposition — how they are wired to feel and react. From early on, Synthetic Users has scaffolded every respondent with a Big Five profile: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism — dimensions that are, at bottom, dispositional and affective. The reason is mundane and well established. Without explicit personality conditioning, generated voices collapse toward the bland modal tone a base model defaults to. The OCEAN scaffold restores the variance that makes one respondent feel like a genuinely different person from the last.
What makes this a core rather than a setting is where the profiles come from, and the fact that everything downstream inherits them. We do not sample a random OCEAN profile. We derive the personality distribution from the large volume of psychographic and behavioural data we acquire — purchasing patterns, content categories, session behaviour and the like — and calibrate the resulting distribution to match the real composition of the population a panel is meant to represent. Geography, industry and segment priors shift the sampling so a panel resembles the actual mix of personalities you would meet in that audience, not a plausible-looking set of one-offs. When the synthetic distribution drifts, we re-weight and remap.
It sits where the brainstem and the old subcortical structures sit: deepest, oldest, and first to fire. It is not the part that reasons articulately — it is the part that sets the priors everything else runs on. A respondent's conscientiousness or neuroticism is decided before the limbic context is loaded and long before any cortical model forms a sentence, and it colours both. That is exactly why we start here rather than finish here: get the core wrong and every layer above it is fluent nonsense about the wrong person.
This is also the layer we are now starting to anchor against the brain itself. A new category of foundation model can predict whole-brain fMRI directly from the same language-model embeddings we use to instantiate respondents — which means the personality scaffold we have treated as a behavioural prior for years is, for the first time, becoming testable as a neural one. We are not running scanners; we build on the public fMRI corpora and the models trained on them. Directionally, the ambition is to push the core from "this respondent answers like a real one would" toward "this respondent's internal state lines up with a real one's." That work is early, and we have written about where it holds and where it breaks elsewhere. But it is why the core is the part of the architecture we care most about getting right: it is the layer closest to being human, and the layer everything else is built on top of.
§ 3The limbic layer: context and memory via RAG
With the respondent instantiated, the next layer gives them something to reason inside. At answer-time we retrieve facts from your interviews, surveys, CRM notes and product docs, and ground responses in them. This is RAG, and it is deliberately not fine-tuning. No retraining is required; when your policies or copy change, the next answer picks it up immediately.
Functionally this is the system's episodic and sensory memory — the layer that supplies the particulars of the situation, the way the limbic system feeds context and salience up into cortical reasoning. It sits above the core for a reason: the same document lands differently on a risk-averse respondent than an open one, so the context is loaded after the personality that will weigh it. Without this layer an interview floats free in generic plausibility. With it, the respondent is answering your onboarding flow, citing your constraints, reacting to your last release.
This is also where customers inject more of their own context — the documents, segments and institutional knowledge that never made it into any model's pretraining. RAG is what tailors a general system to a specific business without touching the weights underneath it.
§ 4The neocortex: how they reason — an ensemble, last, because one model thinks too cleanly
Only now do the models reason and speak. The outer layer of the system is a set of frontier models — GPT, Claude, Gemini, Llama, Mistral, Hermes — and the agents that drive them. We are model-agnostic by design. A lightweight router selects, and sometimes sequences, multiple models for a single session, and can aggregate their outputs. You control the task ("evaluate the onboarding flow"), the audience hints, and the constraints (jurisdiction, tone, risk). We adjust model choice and ordering, temperature, aggregation and guardrails.
The reason this layer comes last, and is plural, maps cleanly onto the brain. A human cortex is not one uniform sheet; it is a patchwork of regions that developed under different pressures and are good at different things, and cognition is the recruitment of the right regions — on top of the priors the deeper structures already set. Foundation models are similar in a way that is easy to miss: they are trained on different data, with different objectives, different reinforcement regimes and different house styles. Each one carries its own affordances — and its own biases and blind spots.
If you route every question to a single model, you inherit that one model's skew on every answer. Shuffling across an ensemble is the system recruiting different faculties for different parts of the interview — clinical precision from one, conversational looseness from another, a contrarian streak from a third — all of them now speaking as the respondent the core already defined. The point is not that more models are smarter. It is that one model is too internally consistent to sound like a room full of people.
This is the largest layer in the system, and it has grown. Alongside the third-party models — GPT, Claude, Gemini, Llama, Mistral, Hermes — sits a member of our own: the SU Persona Adapter, a persona model trained in-house and routed into the ensemble like any other. More on where it comes from in a moment; the point here is that the cortex is no longer just borrowed general-purpose models. Part of it is tissue we grew ourselves, specialised for the one job of sounding like a calibrated person.
A stack of agents does the work across that ensemble, mediated by a router that acts as a switchboard — selecting and sequencing models for each agent, at each turn. The Planner turns your research goal into an interview plan. The Interviewer runs the script and asks natural follow-ups while staying on brief. The Critic checks the conversation and triggers re-asks — more on it below, because it does more than score. The Synthesizer turns many interviews into insights, generalisations and probe-worthy gaps. They coordinate and learn from outcomes rather than leaning on one monolithic prompt — closer to a working mind than a single oracle being interrogated.
§ 5The feedback loop: verbal feedback, distilled into the core
A brain that could not learn from being wrong would be useless. The loop that ties everything together over time is also where the most important upgrade lives. It starts with the Critic — and the first thing to understand is that the Critic does not emit a single score. It emits dimensional verbal feedback: specific, written critiques across believability, knowledge, persona fidelity, goal-adherence, secret-keeping, internal contradictions and coverage. Not "7/10," but "this respondent broke character when asked about pricing, and never surfaced the budget constraint a real buyer in this segment would raise."
That feedback does two things. At runtime, it loops straight back to the Interviewer and triggers re-asks, sharpening the interview in the moment. Over time, it is distilled: the dimension-tagged critiques are captured, used to generate a feedback-conditioned "teacher" rollout (what the response should have been, given the critique), and that pair is jointly optimised into the SU Persona Adapter — the in-house model in the ensemble. The result is that the dimensions get baked into the model rather than enforced turn by turn. At inference the Persona behaves as if the Critic were always present, without a Critic in the loop for the dimensions it has already learned.
There is a clean brain parallel for this, and it is the most important one in the piece. Verbal-feedback distillation is consolidation: the move from effortful, deliberate correction — the kind of conscious, working-memory control the prefrontal cortex does — to automatic, internalised competence, the kind that lives in procedural memory once a skill is learned. The Critic is the system thinking out loud about what it got wrong; the adapter is that correction becoming a habit it no longer has to think about. A person learns to drive the same way: every input is conscious and narrated at first, and silent and automatic later.
And the signal does not stop at the cortex. The same captured feedback — misses, contradictions, weak coverage, parity deltas against organic interviews, calibration drift — runs all the way back down and retunes the personality remapping in the core itself. The feedback closes the loop in the right direction: it reaches the deepest layer and adjusts how the respondents feel, not just how they reason. The core, the memory and the cortex are continuously tuned by the gap between what the system predicted and what real comparisons show. That is why this is a living system rather than a fixed pipeline.
§ 6A worked example, from the core out
Take a concrete task: "Clinical interview with an oncologist in Berlin about trial-enrollment UX." Watch it move through the layers, in order.
- The core fires first: it samples a personality consistent with the real-world distribution for German hospital specialists — high conscientiousness, risk-aware, time-constrained — drawn from the calibrated behavioural model rather than invented. This is the respondent, before a word is spoken.
- The limbic layer loads context: RAG pulls your oncology guidelines, past interview notes and the relevant policy constraints into the situation this respondent will reason inside.
- The cortex then reasons and speaks: the router favours models strong on clinical context and formal tone and sequences them; the interviewer probes consent flow, eligibility filters and EMR hand-offs.
- The critic flags missing questions on adverse-event reporting and multilingual support, and triggers re-asks.
- The feedback loop updates routing and, if it sees systematic gaps for this cohort, nudges the remapping in the core.
Minutes later you have a realistic interview and a defensible report — not because one model is brilliant, but because the system built the respondent from the inside out and each layer did the one thing it is shaped to do.
§ 7Why this architecture, in one screen
- Core first. How a respondent feels — their disposition — gets decided before any model reasons, drawn from acquired behavioural data and fit to the real population, the way deep brain structures set priors the cortex inherits.
- Grounded in your data. RAG adds your facts at answer-time, on top of that personality, so outputs stay specific and current without retraining.
- Ensemble beats a single model. Models carry different strengths and biases; routing and aggregation reduce skew and recover the heterogeneity of a real panel.
- Multi-agent beats one prompt. Agents that collaborate and critique lift realism and depth without hand-holding.
- Verbal feedback, not just a score. The Critic's dimensional critiques are distilled into an in-house adapter, so good behaviour gets internalised rather than re-enforced on every turn — the system's version of a skill becoming second nature.
- Not digital twins. We model behavioural spaces — traits × context × knowledge — not 1:1 replicas of individuals. It scales, and it stays fresh.
§ 8Not digital twins — and why that matters
Digital twins try to clone individuals one by one. That breaks down with real audiences: combinatorial explosion, drift, privacy overhead, and brittleness the moment you step outside observed data. We do the opposite. We factorise a person into traits, context and knowledge; calibrate the trait distribution to the organic world; sample cohorts from that calibrated space; and compose behaviours that generalise and update instantly when any factor changes.
It is the same reason a brain does not store a separate copy of every face it has seen. It learns the dimensions along which faces vary and reconstructs any particular one on demand. Our respondents are reconstructed the same way — from a calibrated space, not a warehouse of clones — which is what lets the system scale to a population without pretending to own a thousand real people. For the deeper version of this argument, see Synthetic Users vs Digital Twins.
§ 9What this buys you
- Speed: minutes to realistic interviews and a solid report.
- Coverage: many personas and contexts fast, without enumerating "twins."
- Change-tracking: when policies or copy change, RAG picks it up without retraining.
- Bias control: ensemble plus critic reduce single-model skew.
- Reality checks: we benchmark against organic interviews, and the calibrated core keeps personas aligned to the organic world — increasingly, all the way down to the brain.
The architecture has not stopped looking like a single chatbot by accident. It looks like a brain because, layer by layer and from the core outward, that is the thing it is trying to rebuild — not a smarter mind than yours, but a more human one, faithful enough to your customers that you can ask it what they would do and trust the answer.