
Introducing Shuffle v2
Shuffle v2 is a feature that intelligently shuffles between multiple large language models via a routing agent to produce more realistic, diverse Synthetic Users with better organic parity.
(Just because we use — frequently, does not mean this was written by a machine ;-)
At Synthetic Users, our core mission is to create AI-driven agents — Synthetic Users — that emulate real human behavior with remarkable fidelity (we call this SOP or Synthetic Organic Parity). In this sense we are diverging from those companies focused on Super Intelligence, simply because most of your customers or users aren’t going to be super intelligent. What we’re interested in is SOP and as such we’ve had to develop a new architecture to leverage LLMs in different ways.
Each new model brings its own set of capabilities, biases, and linguistic signatures. As we strive for SOP — the point at which Synthetic Users act, react, and communicate in ways indistinguishable from real humans — our research has shown that relying on any single LLM imposes inherent limitations on realism.
That’s why we’re excited to announce Shuffle v2, a step forward in our approach to synthesizing behavior by intelligently shuffling between different large language models. This update seeks to deliver more realistic, varied, and contextually accurate Synthetic Users by leveraging the strengths of multiple LLMs simultaneously.
Why Shuffle?
Different language models are trained on different datasets, apply distinct architectures, and adopt divergent training objectives. Consequently, each LLM exhibits unique biases and excels at mimicking certain behavioral niches better than others. By rotating among multiple models: such as GPT, Gemini, Mistral, Claude, Llama, Hermes etc...— we harness the diversity of these models’ training histories.
Heterogeneous Biases, Balanced Output
All machine learning models have internalized biases in their weights based on the data they were trained on. By leveraging several models in tandem, we are effectively combining different “perspectives.” This reduces overfitting to any one model’s bias and results in Synthetic Users whose behaviors are more robust and closely aligned with the multifaceted nature of real human populations.
Parallel to Ensemble Methods
In broader machine learning research, combining multiple models to improve performance is a well-known technique—often referred to as ensemble learning. As noted by Dietterich (2000) in his paper “Ensemble Methods in Machine Learning,” ensemble approaches improve generalization and reduce variance. Shuffle v2 takes this principle a step further by intelligently selecting or routing which models participate in generating specific aspects of Synthetic User behavior.
How Shuffle v2 Works
Shuffle v2 we utilize a lightweight routing agent that “learns” how to delegate tasks to the most suitable model—or combination of models—based on the target audience or behavioral pattern we aim to replicate.
Behavioral Goal Definition
We start by defining the target behavioral profile of the Synthetic User population. Are we trying to recreate tech-savvy early adopters? Or older demographics with different linguistic nuances? By capturing these requirements, Shuffle v2 can determine which models (or ensembles of models) are likely to yield the best results.
Contextual Filtering
Each LLM is evaluated for its strengths in a specific demographic or psychographic niche. For instance, Model A might excel in short, factual answers, while Model B produces more conversational, empathetic responses. The routing agent maintains a knowledge base of such performance attributes.
Adaptive Model Selection
Using a scoring or ranking algorithm influenced by Reinforcement Learning—see Sutton & Barto (2018) “Reinforcement Learning: An Introduction (2nd ed.)”—the routing agent routes each request (or each piece of a longer synthetic conversation) to the model(s) that best fit the criteria for that user or conversation. Over time, this agent adapts to changing data distributions, improving routing accuracy.
Synthesis & Validation
The selected models generate candidate responses. These outputs are then merged or curated (in certain scenarios) to avoid contradictory or repetitive statements. A final validation step ensures that each Synthetic User’s overall behavior remains coherent.
Feedback Loop
After platform deployment (every month), we run SOP tests to assess how closely the Synthetic Users match the organic user dataset. The results of these tests get fed back into the routing agent, helping it refine future decisions.
Scientific Underpinnings
Mixture of Experts
Shuffle v2 builds on the concept of mixture-of-experts approaches, where multiple specialized models (“experts”) tackle sub-problems and a “gating network” decides which model to use. For example, Google’s Switch Transformer (Fedus, Zoph, & Shazeer, 2021) in their paper “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity” demonstrates how a gating mechanism can dynamically route tasks to different model components.
Robustness via Ensemble Diversity
Kuncheva & Whitaker (2003) discuss “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy” and demonstrate that model diversity is a key factor in effective ensemble systems. By shuffling among models, we increase the diversity of possible outputs, thus stabilizing the behavioral distribution of our Synthetic Users.
Ensemble Based Decision Making
He, J., Zhou, X., Zhang, R., & Yang, C. in “An ensemble learning framework based on group decision making” (Publisher: IEEE) provide evidence that ensemble-based strategies can improve decision reliability. Shuffle v2 adapts these findings to the generation of more varied and realistic Synthetic User behaviors.
Looking Ahead
Shuffle v2 is a significant leap toward bridging the gap between synthetic and organic user behavior. In the near future, we plan to:
Expand the Model Pool: Incorporate more specialized models (e.g., domain-specific LLMs for finance, healthcare, etc.) to further refine Synthetic User authenticity.
Enhance Real-Time Adaptability: Introduce automated retraining pipelines for the routing agent to handle shifts in linguistic trends or user behaviors.
Integrate Deeper Behavioral Markers: Beyond just language, we aim to include non-textual cues (reaction times, sentiment patterns, etc.) so Synthetic Users can more faithfully reflect real-world human variability.
By enabling Shuffle v2, organizations can model user behavior with greater nuance, gain deeper insights into diverse audience segments, and develop more reliable prototypes for user interactions. We’re excited to see how Shuffle v2 transforms your experience with Synthetic Users—and paves the way for ever more authentic AI-driven emulations of human behavior.
References & Further Reading
Dietterich, T. G. (2000). Ensemble Methods in Machine Learning. International workshop on multiple classifier systems, (pp. 1–15).
Fedus, W., Zoph, B., & Shazeer, N. (2021). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv preprint arXiv:2101.03961.
Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181–207.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press.
He, J., Zhou, X., Zhang, R., & Yang, C. An ensemble learning framework based on group decision making. Publisher: IEEE.
Releated Articles
More articles for you

The Lie We Tell Ourselves About Customer Research
Most research asks what people say. The problem is people don't do what they say. This piece breaks down the gap between stated and revealed preference — and why behavioral modeling, not better interviews, is how you close it.

Two ways to run research with Synthetic Users and why the difference matters
Iris, what is the difference of using agents to accelerate research.

Synthetic Users vs digital twins
You don’t need a twin for “a parent in rural Ohio who shops weekly at Walmart, prefers fragrance-free, and has a toddler with eczema.” You sample a parent profile with relevant traits and constraints, add retail and dermatology context, and generate behaviors consistent with both.

Two major papers. One shared direction.
LLM-powered Synthetic Users have crossed from concept to validated method. This proves they can predict human behavior accurately, letting teams run fast, low-cost behavioral experiments without replacing real participants.

Gartner says we lead. That's kind of them.
Gartner’s latest report on AI-powered synthetic user research cites Synthetic Users as a leader.

Introducing Shuffle v2
Shuffle v2 is a feature that intelligently shuffles between multiple large language models via a routing agent to produce more realistic, diverse Synthetic Users with better organic parity.

Chain-of-feeling
Synthetic Users use a “chain-of-feeling” approach—combining emotional states with OCEAN personality traits—to produce more human-like, realistic user responses and yield richer UX insights.

Generative Agent Simulations of 1,000 People
A paper that thoroughly executes a parity study between Synthetic and Organic users.

21 Peer reviewed papers that support the Synthetic Users thesis
Here is a compilation of all the papers that help make a case for Synthetic Users.

Why we shuffle between models — to ensure both parity and diversity!
Synthetic Users balances aligned and unaligned models to maintain diversity and authenticity in simulated users while ensuring ethical standards and user expectations are met.

Latest press articles for Synthetic Users
Synthetic Users and AI are transforming research methodologies, offering innovative, cost-effective alternatives to traditional human subject studies.

Comparison studies. The opportunity lies in the deviation.
When we compare different studies, especially looking at what synthetic (artificial interviews) and organic (real-world interviews) data tell us, we often find they mostly talk about the same things but there's also a bit where they don't match up. This gap is super interesting because it's like finding hidden treasure in what we thought we knew versus what we might have missed.

How we deal with bias
Harnessing the power of AI in our Synthetic Users, we strive for a balance between reflecting reality and ethical responsibility, ensuring diversity and fairness while maintaining realism.

The transition to Continuous Insight
The transition towards Continuous Insight™ aligns research activities more closely with the dynamic needs of the business and ensures that product development is continuously informed by up-to-date user insights.

The Art of the Vibes Engine
Large language models (LLMs) like GPT-4 serve as powerful "vibes engines," empathizing with diverse groups and generating contextually relevant content. Their applications span market research, customer support, user experience design, and mental health support, offering invaluable insights and personalized experiences. While not infallible sources of truth, LLMs enable creativity, personalization, and connection within the realm of human language.

There is a faster and more accurate way to do research. Use Synthetic Users.
How Synthetic Users is changing the research process.

The wisdom of the silicon crowd
In the light of an ancient parable, we explore a new paper that dives into how ensembles of large language models match the prediction accuracy of human crowds. It reveals that combining machine predictions with human insights leads to the most robust forecasting results.

Three research papers that helped us build ❤️ Synthetic Users
For the sceptics amongst us who need more tangible research in order to engage with this brave new world. Full disclosure: we are part of the sceptics.

What is RAG and why it’s important for Synthetic Research
Ahead of our RAG launch we explain Retrieval-Augmented Generation (RAG) and how it enhances Synthetic Users by providing increased realism, contextual depth, and adaptive learning, with profound implications for market research, user experience testing, training, education, and innovative product development.

Synthetic Users system architecture (the simplified version).
Foundation models underpin Synthetic Users with advanced capabilities, enhanced by synthetic data and RAG layers for realism and business alignment, all within a collaborative multi-agent framework for richer interactions.

Saturation score. How do we know how many interviews to run?
Determine your interview target for achieving topic saturation using our efficient approach, leveraging the historical wisdom of research pioneers. This method ensures deep insights with theoretical sampling at its core.

How Synthetic Users are gaining depth
Synthetic Users are evolving to address criticism about their generalist nature by incorporating representative data sets and personal narratives.

How we compare interviews to ensure we improve our Synthetic Organic Parity — 85 to 92%
How do we know we are right? How do we know our Synthetic Users are as real as organic users? We compare.

Synthetic Users: Merging Qualitative and Quantitative Research, in seconds.
At Synthetic users we are blurring the lines between qualitative and quantitative research. Here's how we are going about this transformative approach.