TLDR: A new research paper addresses the issue of Large Language Models (LLMs) producing homogeneous outputs that fail to capture human diversity. The authors propose a novel framework to construct a set of LLM agents, each steered by human demonstrations via in-context learning, to collectively represent diverse human populations. By formulating this as a submodular optimization problem, they developed methods (REPPOPdemo, REPPOPmapped-1, REPPOPmapped-2) that efficiently select representative agents. Experiments in education, opinion surveys, and data annotation demonstrate that these methods significantly reduce representation error and enable agents to reproduce human-like behavior patterns on new tasks, outperforming existing baselines.
Large Language Models (LLMs) have become incredibly powerful tools, often used as stand-ins for human responses in various research and industry applications. However, a significant challenge with these models is their tendency to produce very similar, or ‘homogeneous,’ outputs. This means they often fail to capture the rich and varied perspectives and behaviors that are characteristic of diverse human populations.
Imagine trying to understand a wide range of opinions on a political issue or the different ways students might approach a math problem, but your AI model only gives you one or two common viewpoints. This limitation restricts the usefulness of LLMs in many areas, from generating diverse text paraphrases to simulating human opinions in surveys or even replicating human studies.
To tackle this problem, a new research paper titled “PROMPTOPTIMIZATIONACROSSMULTIPLEAGENTS FORREPRESENTINGDIVERSEHUMANPOPULATIONS” by Manh Hung Nguyen, Sebastian Tschiatschek, and Adish Singla introduces a novel approach. Instead of trying to make a single LLM agent capture all human diversity, the researchers propose building a *set* of specialized LLM agents. Each agent in this set is designed to represent a distinct segment of a human population, and together, they collectively capture the overall diversity.
The core idea is to ‘steer’ each LLM agent’s behavior by providing it with a small collection of human examples, known as ‘demonstrations,’ through a technique called in-context learning. These demonstrations are essentially task-response pairs that show the agent how a particular type of human would behave. The main hurdle then becomes how to select the most representative set of these agents from an enormous number of possibilities.
The researchers framed this complex selection challenge as a ‘submodular optimization’ problem. This mathematical approach helps in finding the best subset of agents efficiently, even though the overall problem is computationally very difficult. They developed several methods that offer different balances between how quickly they can find a solution and how well that solution performs.
How the Agents are Built and Selected
The paper introduces three main methods for constructing and selecting these representative agents:
-
REPPOPdemo: This method builds each agent step-by-step. Instead of looking at all possible agents at once, it greedily selects individual human demonstrations to add to an agent’s ‘context’ (the information it learns from). This significantly reduces the computational effort.
-
REPPOPmapped-1 and REPPOPmapped-2: These methods create a smaller, more manageable pool of ‘proxy’ agents, where each proxy agent is directly linked to a specific human in the population. For REPPOPmapped-1, the demonstrations for each human-mapped agent are chosen randomly. REPPOPmapped-2 is more sophisticated, greedily selecting demonstrations that best mimic the individual human’s behavior.
These methods ensure that the selected agents, when combined, effectively cover the spectrum of human behaviors and perspectives.
Also Read:
- Unpacking Knowledge Collapse: How LLMs Shape Our Information Landscape
- Large Language Models Emerge as Adaptable Teammates for Human-AI Collaboration
Real-World Applications and Results
The effectiveness of this framework was tested across various domains:
-
Education (EEDI dataset): Simulating diverse student behaviors in answering math questions. This could help teachers practice instructional strategies or conduct virtual pre-tests.
-
Opinion Surveys (OpinionQA dataset): Creating agents that reflect different opinions and beliefs, similar to diverse crowdworkers responding to political surveys.
-
Data Annotation (WikiArt dataset): Generating agents that can annotate images with diverse emotional responses and descriptions, mimicking human annotators with different personalities.
The experiments showed that the proposed methods consistently reduced the ‘representation error’ – meaning the agents more accurately mirrored human populations compared to traditional approaches. Notably, REPPOPmapped-2, which carefully selects demonstrations, often performed the best. The framework also proved robust, working well with various LLMs of different sizes.
Beyond just numerical accuracy, the researchers also conducted a behavioral analysis. They found that the agents constructed by their methods genuinely exhibited behaviors similar to the human groups they were designed to represent. For instance, an agent representing students with a strong grasp of mental multiplication would perform similarly on new math problems, and an agent representing a particular political ideology would express opinions consistent with that ideology, even though no explicit demographic data was used in their construction.
This research marks a significant step towards creating more nuanced and representative AI systems that can better reflect the complexity of human populations. While the current work focuses on prompting-based approaches and doesn’t delve into demonstration ordering, it lays crucial groundwork for future applications, such as training teachers with diverse student models or simulating responses to new government policies. You can read the full research paper here: PROMPTOPTIMIZATIONACROSSMULTIPLEAGENTS FORREPRESENTINGDIVERSEHUMANPOPULATIONS.


