spot_img
HomeResearch & DevelopmentSynthetic Consumers: How LLMs Accurately Predict Purchase Intent with...

Synthetic Consumers: How LLMs Accurately Predict Purchase Intent with Textual Responses

TLDR: A new method called Semantic Similarity Rating (SSR) allows Large Language Models (LLMs) to accurately simulate human purchase intent. Instead of direct numerical ratings, LLMs provide free-text responses which are then mapped to Likert scales using semantic similarity. This approach achieves 90% human test-retest reliability and realistic response distributions, offering scalable and cost-effective consumer research with rich qualitative feedback, particularly mirroring human behavior for age and income demographics.

Consumer research is a cornerstone for companies developing new products, guiding crucial decisions before significant investments in production and launch. However, this traditional approach, costing billions annually, often grapples with limitations such as panel biases and difficulties in scaling up. The emergence of large language models (LLMs) has opened a new avenue, offering the potential to simulate synthetic consumers and revolutionize how companies gather insights.

Initially, using LLMs for consumer research presented a significant challenge: when directly asked for numerical ratings, like on a Likert scale (e.g., 1 to 5 for purchase intent), LLMs tended to produce unrealistic response distributions. These distributions were often too narrow, systematically skewed, or simply inconsistent with actual human survey data. This raised questions about the fundamental suitability of LLMs as survey respondents.

A recent research paper, titled “LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings,” introduces a novel method called Semantic Similarity Rating (SSR) that addresses this very issue. The authors argue that the problem isn’t with LLMs themselves, but with the method used to elicit their responses. Instead of asking for a direct number, SSR prompts LLMs to generate free-text statements expressing their purchase intent. These textual responses are then mapped to Likert distributions by comparing their semantic similarity to predefined reference statements using embedding technology.

The effectiveness of the SSR method was rigorously tested on an extensive dataset. This dataset comprised 57 personal care product surveys, originally conducted by a leading corporation in the market, involving a total of 9,300 human responses. The results were highly encouraging: SSR achieved an impressive 90% of human test-retest reliability. This means that the synthetic consumers’ responses were remarkably consistent, mirroring how reliable human responses would be if the survey were repeated. Furthermore, the method successfully maintained realistic response distributions, with a Kolmogorov–Smirnov (KS) similarity greater than 0.85, indicating a strong alignment with human data patterns.

Beyond quantitative metrics, SSR offers an additional significant benefit: rich qualitative feedback. The free-text responses generated by the synthetic consumers provide detailed rationales explaining their ratings. This qualitative data can be invaluable for product development, offering insights into positive features, potential concerns, and underlying value propositions, which often go uncaptured or are minimally expressed in traditional human surveys.

The study also explored how well synthetic consumers mirrored human behavior across different demographic attributes and product characteristics. It found that LLMs, when conditioned on demographic personas, replicated human response patterns relatively well, particularly concerning age and income level. For instance, both younger and older synthetic participants tended to rate purchase intent lower than middle-aged cohorts, a behavior observed in real human data. Similarly, synthetic consumers prompted with budgetary concerns responded with lower purchase intent, consistent with human behavior. However, the replication was less consistent for factors like gender and dwelling region, suggesting areas for further refinement.

Also Read:

This framework represents a significant step forward for scalable consumer research simulations. It preserves traditional survey metrics and interpretability while overcoming previous limitations of LLMs in generating realistic numerical ratings. Importantly, the SSR method requires no training data or fine-tuning on consumer responses, making it a cost-effective and widely applicable plug-and-play tool. While the method relies on carefully designed reference statements and the performance can be influenced by the choice of embedding model, it establishes a credible foundation for augmenting and accelerating consumer insight generation. For more details, you can read the full paper here: LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -