Synthetic Consumers: How LLMs Accurately Predict Purchase Intent with Textual Responses

TLDR: A new method called Semantic Similarity Rating (SSR) allows Large Language Models (LLMs) to accurately simulate human purchase intent. Instead of direct numerical ratings, LLMs provide free-text responses which are then mapped to Likert scales using semantic similarity. This approach achieves 90% human test-retest reliability and realistic response distributions, offering scalable and cost-effective consumer research with rich qualitative feedback, particularly mirroring human behavior for age and income demographics.

Consumer research is a cornerstone for companies developing new products, guiding crucial decisions before significant investments in production and launch. However, this traditional approach, costing billions annually, often grapples with limitations such as panel biases and difficulties in scaling up. The emergence of large language models (LLMs) has opened a new avenue, offering the potential to simulate synthetic consumers and revolutionize how companies gather insights.

Initially, using LLMs for consumer research presented a significant challenge: when directly asked for numerical ratings, like on a Likert scale (e.g., 1 to 5 for purchase intent), LLMs tended to produce unrealistic response distributions. These distributions were often too narrow, systematically skewed, or simply inconsistent with actual human survey data. This raised questions about the fundamental suitability of LLMs as survey respondents.

A recent research paper, titled “LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings,” introduces a novel method called Semantic Similarity Rating (SSR) that addresses this very issue. The authors argue that the problem isn’t with LLMs themselves, but with the method used to elicit their responses. Instead of asking for a direct number, SSR prompts LLMs to generate free-text statements expressing their purchase intent. These textual responses are then mapped to Likert distributions by comparing their semantic similarity to predefined reference statements using embedding technology.

The effectiveness of the SSR method was rigorously tested on an extensive dataset. This dataset comprised 57 personal care product surveys, originally conducted by a leading corporation in the market, involving a total of 9,300 human responses. The results were highly encouraging: SSR achieved an impressive 90% of human test-retest reliability. This means that the synthetic consumers’ responses were remarkably consistent, mirroring how reliable human responses would be if the survey were repeated. Furthermore, the method successfully maintained realistic response distributions, with a Kolmogorov–Smirnov (KS) similarity greater than 0.85, indicating a strong alignment with human data patterns.

Beyond quantitative metrics, SSR offers an additional significant benefit: rich qualitative feedback. The free-text responses generated by the synthetic consumers provide detailed rationales explaining their ratings. This qualitative data can be invaluable for product development, offering insights into positive features, potential concerns, and underlying value propositions, which often go uncaptured or are minimally expressed in traditional human surveys.

The study also explored how well synthetic consumers mirrored human behavior across different demographic attributes and product characteristics. It found that LLMs, when conditioned on demographic personas, replicated human response patterns relatively well, particularly concerning age and income level. For instance, both younger and older synthetic participants tended to rate purchase intent lower than middle-aged cohorts, a behavior observed in real human data. Similarly, synthetic consumers prompted with budgetary concerns responded with lower purchase intent, consistent with human behavior. However, the replication was less consistent for factors like gender and dwelling region, suggesting areas for further refinement.

Also Read:

This framework represents a significant step forward for scalable consumer research simulations. It preserves traditional survey metrics and interpretability while overcoming previous limitations of LLMs in generating realistic numerical ratings. Importantly, the SSR method requires no training data or fine-tuning on consumer responses, making it a cost-effective and widely applicable plug-and-play tool. While the method relies on carefully designed reference statements and the performance can be influenced by the choice of embedding model, it establishes a credible foundation for augmenting and accelerating consumer insight generation. For more details, you can read the full paper here: LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Synthetic Consumers: How LLMs Accurately Predict Purchase Intent with Textual Responses

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Avalara Secures $500 Million Investment from BlackRock to Propel AI-Powered Tax Automation

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates