
How Synthetic Users are gaining depth
Synthetic Users are evolving to address criticism about their generalist nature by incorporating representative data sets and personal narratives.
This article mentioning Synthetic Users made us reflect on the main criticism Synthetic Users get. They are too generalist. It’s a fair criticism and one we are working hard to disprove with each new iteration of Synthetic Users.
How we are evolving:
1.
The first biggest concern is participant bias. LLMs based pre-trained on the whole internet are not representative of real people. They have a different geographic distribution, gender distribution (the internet skews male) and a different socio-economic status.
The first thing we had to do is use a widely available data set representative of human behaviours in different countries. For the US we use the General Survey Study. It is the most rigorous yearly study on American behaviours, attitudes and opinions. It gave us the high quality US census data to fine tune our model with. This way we could start biasing it toward creating Synthetic Users that replicate the American consumer. By finetunning with other Survey Studies we are able to do so for other countries.
The benefit of this was easy to validate. When we creating a sample of Synthetic Users, we ask our model questions like what is your gender distribution? A pure GPT trained on the internet would answer 70% male, with the GSS fine tune we got 48% male, which is accurate. We validated this with multiple variable, like age, ethnicity, political inclination…
We ingested all of the following 23 surveys to ensure behavioural accuracy:
General Social Survey (GSS) - USA
British Social Attitudes Survey - UK
General Social Survey (GSS) - Canada
Household, Income and Labour Dynamics in Australia (HILDA) - Australia
European Social Survey (ESS) - European Union
German General Social Survey (ALLBUS) - Germany
Longitudinal Internet Studies for the Social Sciences (LISS) - Netherlands
Japanese General Social Surveys (JGSS) - Japan
Korean General Social Survey (KGSS) - South Korea
Chinese General Social Survey (CGSS) - China
Taiwan Social Change Survey (TSCS) - Taiwan
India Human Development Survey (IHDS) - India
Afrobarometer - Sub-Saharan Africa
South African Social Attitudes Survey (SASAS) - South Africa
Demographic and Health Surveys (DHS) - Various countries
Latinobarómetro - Latin America
Russian Longitudinal Monitoring Survey (RLMS) - Russia
Brazilian General Social Survey (BGSS) - Brazil
Mexican National Survey on Discrimination (ENADIS) - Mexico
National Survey of the Political Culture (ENCUP) - Colombia
Social Longitudinal Survey (ELSOC) - Chile
Social Debt Survey (EDS) - Argentina
Central American Survey of Living Conditions (ECV) - Central America
2.
We now have the correct distributions (provided by the various national surveys). Our team is currently engineering new strategies to surface ‘personal accounts’ which lend more depth to our interviews.
Narrative Synthesis lies at the core of our strategy. We are developing algorithms capable of synthesizing personal narratives in a way that feels authentic and relevant to each persona’s background. This involves combining elements of different stories in a manner that maintains internal consistency and reflects the complexity of human experiences.
Sentiment Analysis allows us to best surface the narratives that require more attention. Researchers and product people want to focus on the pains and so we use sentiment analysis to gauge the emotional tone of narratives. This can help in understanding the context and emotional layers within personal stories, which is crucial for replicating human-like empathy and understanding in synthetic users.
From a sourcing perspective we have a wide range of sources for personal narratives. Public blogs, social media, forums are the richest environments we are tapping in order to enrich our dataset.
3.
All this means that our interviews are gaining more depth but also more statistical accuracy, which leads us to the next iteration. Yes, better interviews but also Surveys. Yes, you’ve asked for them and they are coming!
Our upcoming Surveys will focus on two things:
Preference mapping
We are able to plot how the appetite for a certain product will perform over time by Synthetic Users of certain Geographies, Social Backgrounds or Age groups.
This helps with your product development. Which features to roll out first.
As a subset of preference mapping we are looking at Targeted Advertising. Given a certain piece of content, which percentage of Synthetic Users will be more likely to be convinced by it? - will be the research question you will be able to pose.
Price sensitivity (using Gabor Granger model)
What is the maximum price your consumers will pay for a certain product. You'll be able to ask Synthetic Users specific questions about their price sensitivity, allowing you to understand the optimal pricing strategy for your target audience. By incorporating the Gabor Granger model, you can accurately determine the price point that will maximize both profitability and consumer demand.
Remember. Synthetic Users are a great way to start planning your research.
If you can get the obvious out of the way first, you are saving time and ensuring your teams align on a baseline so you can then focus on what is less obvious. If for argument’s sake the deviation is 15%, then you are 85% there. You have a good grasp of which interviews work and which don’t. You can adjust your script. You can hone in on the harder questions where organic users have an edge (for the time being) around personal accounts.
Releated Articles
More articles for you

Teaching Synthetic Users What Real People Actually Think
Synthetic Users without calibration are individually believable, but collectively wrong. The missing piece is calibration, not better models.

The Lie We Tell Ourselves About Customer Research
Most research asks what people say. The problem is people don't do what they say. This piece breaks down the gap between stated and revealed preference — and why behavioral modeling, not better interviews, is how you close it.

Two ways to run research with Synthetic Users and why the difference matters
Iris, what is the difference of using agents to accelerate research.

Synthetic Users vs digital twins
You don’t need a twin for “a parent in rural Ohio who shops weekly at Walmart, prefers fragrance-free, and has a toddler with eczema.” You sample a parent profile with relevant traits and constraints, add retail and dermatology context, and generate behaviors consistent with both.

Two major papers. One shared direction.
LLM-powered Synthetic Users have crossed from concept to validated method. This proves they can predict human behavior accurately, letting teams run fast, low-cost behavioral experiments without replacing real participants.

Gartner says we lead. That's kind of them.
Gartner’s latest report on AI-powered synthetic user research cites Synthetic Users as a leader.

Introducing Shuffle v2
Shuffle v2 is a feature that intelligently shuffles between multiple large language models via a routing agent to produce more realistic, diverse Synthetic Users with better organic parity.

Chain-of-feeling
Synthetic Users use a “chain-of-feeling” approach—combining emotional states with OCEAN personality traits—to produce more human-like, realistic user responses and yield richer UX insights.

Generative Agent Simulations of 1,000 People
A paper that thoroughly executes a parity study between Synthetic and Organic users.

21 Peer reviewed papers that support the Synthetic Users thesis
Here is a compilation of all the papers that help make a case for Synthetic Users.

Why we shuffle between models — to ensure both parity and diversity!
Synthetic Users balances aligned and unaligned models to maintain diversity and authenticity in simulated users while ensuring ethical standards and user expectations are met.

Latest press articles for Synthetic Users
Synthetic Users and AI are transforming research methodologies, offering innovative, cost-effective alternatives to traditional human subject studies.

Comparison studies. The opportunity lies in the deviation.
When we compare different studies, especially looking at what synthetic (artificial interviews) and organic (real-world interviews) data tell us, we often find they mostly talk about the same things but there's also a bit where they don't match up. This gap is super interesting because it's like finding hidden treasure in what we thought we knew versus what we might have missed.

How we deal with bias
Harnessing the power of AI in our Synthetic Users, we strive for a balance between reflecting reality and ethical responsibility, ensuring diversity and fairness while maintaining realism.

The transition to Continuous Insight
The transition towards Continuous Insight™ aligns research activities more closely with the dynamic needs of the business and ensures that product development is continuously informed by up-to-date user insights.

The Art of the Vibes Engine
Large language models (LLMs) like GPT-4 serve as powerful "vibes engines," empathizing with diverse groups and generating contextually relevant content. Their applications span market research, customer support, user experience design, and mental health support, offering invaluable insights and personalized experiences. While not infallible sources of truth, LLMs enable creativity, personalization, and connection within the realm of human language.

There is a faster and more accurate way to do research. Use Synthetic Users.
How Synthetic Users is changing the research process.

The wisdom of the silicon crowd
In the light of an ancient parable, we explore a new paper that dives into how ensembles of large language models match the prediction accuracy of human crowds. It reveals that combining machine predictions with human insights leads to the most robust forecasting results.

Three research papers that helped us build ❤️ Synthetic Users
For the sceptics amongst us who need more tangible research in order to engage with this brave new world. Full disclosure: we are part of the sceptics.

What is RAG and why it’s important for Synthetic Research
Ahead of our RAG launch we explain Retrieval-Augmented Generation (RAG) and how it enhances Synthetic Users by providing increased realism, contextual depth, and adaptive learning, with profound implications for market research, user experience testing, training, education, and innovative product development.

Synthetic Users system architecture (the simplified version).
Foundation models underpin Synthetic Users with advanced capabilities, enhanced by synthetic data and RAG layers for realism and business alignment, all within a collaborative multi-agent framework for richer interactions.

Saturation score. How do we know how many interviews to run?
Determine your interview target for achieving topic saturation using our efficient approach, leveraging the historical wisdom of research pioneers. This method ensures deep insights with theoretical sampling at its core.

How Synthetic Users are gaining depth
Synthetic Users are evolving to address criticism about their generalist nature by incorporating representative data sets and personal narratives.

How we compare interviews to ensure we improve our Synthetic Organic Parity — 85 to 92%
How do we know we are right? How do we know our Synthetic Users are as real as organic users? We compare.

Synthetic Users: Merging Qualitative and Quantitative Research, in seconds.
At Synthetic users we are blurring the lines between qualitative and quantitative research. Here's how we are going about this transformative approach.