TLDR: This research paper analyzes 83 persona prompts from 27 articles to understand how large language models (LLMs) are used to generate user personas. Key findings show that LLMs primarily create single, concise text-based personas, often including demographic data. While GPT models dominate, researchers are increasingly using multi-prompt strategies and integrating dynamic data, highlighting both opportunities and challenges for user representation.
User personas are fictional representations of user groups, built on real data, that help designers and stakeholders make informed decisions. Traditionally, human experts analyzed user data to create these detailed profiles. However, with the rise of artificial intelligence, particularly large language models (LLMs), the process of persona creation is evolving rapidly.
A recent research paper, titled “Using AI for User Representation: An Analysis of 83 Persona Prompts,” delves into how researchers are currently leveraging LLMs for this purpose. Authored by Joni Salminen, Danial Amin, and Bernard J. Jansen, the study provides a comprehensive look at the prompting strategies used to generate these AI-driven personas. For more details, you can read the full paper here.
Why Researchers Use AI for Personas
The study found that researchers employ LLMs to create, evaluate, and apply personas across a wide array of applications. These range from using personas as educational tools for training counselors to serving as proxies for understanding audiences and even aiding in storytelling. While persona generation is the primary use case (81.48% of studies), LLMs are also used for predicting persona behaviors and evaluating persona quality. Interestingly, over half of the prompts require the persona output in a structured format, most commonly JSON, which is particularly useful for further data analysis.
How Prompts Shape AI-Generated Personas
The research highlights that GPT models are overwhelmingly dominant in persona generation, appearing in over 76% of all model instances. The complexity of prompts varies significantly, from simple one-liners to intricate, multi-stage systems that guide the LLM through a complete persona generation process. On average, researchers use about three prompts per study, with some employing as many as 12. A notable trend is the dynamic insertion of data or variables into prompts, occurring in nearly three out of four cases. This allows for the integration of real user data directly into the AI-driven persona creation process, moving towards what some call “computational personas.”
Characteristics of AI-Generated Personas
When it comes to the output, personas are predominantly generated in text and number formats, with image generation being surprisingly infrequent. Most prompts specify the number of personas to generate, often requesting a single persona, which deviates from the traditional goal of representing diverse user populations. Researchers also frequently include instructions for the length of the persona output, often aiming for concise descriptions, which contrasts with the traditional emphasis on rich, detailed persona narratives.
Demographic information, such as age, name, and occupation, is the most common type of data included in AI-generated personas, appearing in nearly 78% of the prompt entries. Other traditional user representation information, including behaviors, attitudes, and contextual details, are also reasonably prevalent, suggesting that established practices from classic persona development are being carried over into the new technological environment.
Also Read:
- Measuring AI’s Ability to Write Surveys: Introducing SGSimEval
- Unpacking Prompt Sensitivity: A Deep Dive into LLM Robustness
Implications and Future Directions
The study points out that while LLMs offer opportunities for faster and more efficient persona creation, they also introduce new challenges. The practice of integrating data directly into prompts, while powerful, can reduce transparency and human oversight. The chaining of multiple prompts, though increasing sophistication, makes evaluating the overall system more complex. The prevalence of single persona generation and the heavy reliance on GPT models without extensive cross-model comparison also raise questions about the diversity and optimal quality of the generated personas.
The authors recommend that to maintain the ‘data-driven’ principle, primary user data should always be included in persona prompts. They also suggest that researchers familiarize themselves with established persona theory to ensure that AI-generated personas are not just technically sound but also empathetic, representative, and truly useful for design and decision-making processes.


