Two major papers. One shared direction.

LLM-powered Synthetic Users have crossed from concept to validated method. This proves they can predict human behavior accurately, letting teams run fast, low-cost behavioral experiments without replacing real participants.

At ICML 2025, researchers from Stanford, University of Chicago, Princeton, and Santa Fe Institute released a position paper arguing that large language models can already simulate human behavior accurately enough for exploratory social science. Around the same time, in Nature, researchers from the Max Planck Institute, NYU, Princeton, and Google DeepMind introduced Centaur, a foundation model of human cognition—fine-tuned on trial-by-trial data from over 60,000 participants across 160 experiments.Together, these two papers (see bottom of this article for links) mark a turning point for anyone working on Synthetic Users, agents, or simulated research.

‍

🧩 The ICML paper outlines five key challenges for LLM-based human simulation:

‍

Diversity

Bias

Sycophancy

Alienness

Generalization

‍

But instead of treating them as fatal flaws, the authors frame them as tractable engineering and methodological problems—solvable with context-rich prompts, fine-tuning, and iterative evaluation.

‍

🧠 Meanwhile, the Centaur team showed what that looks like in practice:

‍

• Centaur outperforms traditional cognitive models in nearly every held-out experiment

• It generalizes across cover stories, task structures, and even entire domains

• Its internal representations align more closely with human fMRI activity

• It supports interpretable, model-guided scientific discovery

‍

They fine-tuned Llama 3.1-70B on 10 million decisions using their Psych-101 dataset—no prompt hacking, just proper training on structured behavioral data.

‍

The takeaway?

‍

Synthetic users are no longer theoretical. They are a new class of method. And the first serious, empirically validated toolkits are already here.They won’t replace human participants—but they can meaningfully expand what’s possible in:

‍

• Pilot studies

• Counterfactuals

• Theory development

• UX research

• Scaling social scienceIf you’re still thinking of synthetic users as a gimmick, it’s time to revisit that position.

‍

Paper 1: https://www.nature.com/articles/s41586-025-09215-4

Two major papers. One shared direction.

More articles curated for you

Two major papers. One shared direction.

Gartner says we lead. That's kind of them.

Introducing Shuffle v2

Chain-of-feeling

Generative Agent Simulations of 1,000 People

21 Peer reviewed papers that support the Synthetic Users thesis

Why we shuffle between models — to ensure both parity and diversity!

Latest press articles for Synthetic Users

How we deal with bias

The Art of the Vibes Engine

Comparison studies. The opportunity lies in the deviation.

The transition to Continuous Insight

There is a faster and more accurate way to do research. Use Synthetic Users.

The wisdom of the silicon crowd

What is RAG and why it’s important for Synthetic Research

Synthetic Users system architecture (the simplified version).

Saturation score. How do we know how many interviews to run?

Three research papers that helped us build ❤️ Synthetic Users

How Synthetic Users are gaining depth

How we compare interviews to ensure we improve our Synthetic Organic Parity — 85 to 92%

Synthetic Users: Merging Qualitative and Quantitative Research, in seconds.