Beyond Single LLMs: Building AI Ensembles for Human Diversity

TLDR: A new research paper addresses the issue of Large Language Models (LLMs) producing homogeneous outputs that fail to capture human diversity. The authors propose a novel framework to construct a set of LLM agents, each steered by human demonstrations via in-context learning, to collectively represent diverse human populations. By formulating this as a submodular optimization problem, they developed methods (REPPOPdemo, REPPOPmapped-1, REPPOPmapped-2) that efficiently select representative agents. Experiments in education, opinion surveys, and data annotation demonstrate that these methods significantly reduce representation error and enable agents to reproduce human-like behavior patterns on new tasks, outperforming existing baselines.

Large Language Models (LLMs) have become incredibly powerful tools, often used as stand-ins for human responses in various research and industry applications. However, a significant challenge with these models is their tendency to produce very similar, or ‘homogeneous,’ outputs. This means they often fail to capture the rich and varied perspectives and behaviors that are characteristic of diverse human populations.

Imagine trying to understand a wide range of opinions on a political issue or the different ways students might approach a math problem, but your AI model only gives you one or two common viewpoints. This limitation restricts the usefulness of LLMs in many areas, from generating diverse text paraphrases to simulating human opinions in surveys or even replicating human studies.

To tackle this problem, a new research paper titled “PROMPTOPTIMIZATIONACROSSMULTIPLEAGENTS FORREPRESENTINGDIVERSEHUMANPOPULATIONS” by Manh Hung Nguyen, Sebastian Tschiatschek, and Adish Singla introduces a novel approach. Instead of trying to make a single LLM agent capture all human diversity, the researchers propose building a *set* of specialized LLM agents. Each agent in this set is designed to represent a distinct segment of a human population, and together, they collectively capture the overall diversity.

The core idea is to ‘steer’ each LLM agent’s behavior by providing it with a small collection of human examples, known as ‘demonstrations,’ through a technique called in-context learning. These demonstrations are essentially task-response pairs that show the agent how a particular type of human would behave. The main hurdle then becomes how to select the most representative set of these agents from an enormous number of possibilities.

The researchers framed this complex selection challenge as a ‘submodular optimization’ problem. This mathematical approach helps in finding the best subset of agents efficiently, even though the overall problem is computationally very difficult. They developed several methods that offer different balances between how quickly they can find a solution and how well that solution performs.

How the Agents are Built and Selected

The paper introduces three main methods for constructing and selecting these representative agents:

REPPOPdemo: This method builds each agent step-by-step. Instead of looking at all possible agents at once, it greedily selects individual human demonstrations to add to an agent’s ‘context’ (the information it learns from). This significantly reduces the computational effort.
REPPOPmapped-1 and REPPOPmapped-2: These methods create a smaller, more manageable pool of ‘proxy’ agents, where each proxy agent is directly linked to a specific human in the population. For REPPOPmapped-1, the demonstrations for each human-mapped agent are chosen randomly. REPPOPmapped-2 is more sophisticated, greedily selecting demonstrations that best mimic the individual human’s behavior.

These methods ensure that the selected agents, when combined, effectively cover the spectrum of human behaviors and perspectives.

Also Read:

Real-World Applications and Results

The effectiveness of this framework was tested across various domains:

Education (EEDI dataset): Simulating diverse student behaviors in answering math questions. This could help teachers practice instructional strategies or conduct virtual pre-tests.
Opinion Surveys (OpinionQA dataset): Creating agents that reflect different opinions and beliefs, similar to diverse crowdworkers responding to political surveys.
Data Annotation (WikiArt dataset): Generating agents that can annotate images with diverse emotional responses and descriptions, mimicking human annotators with different personalities.

The experiments showed that the proposed methods consistently reduced the ‘representation error’ – meaning the agents more accurately mirrored human populations compared to traditional approaches. Notably, REPPOPmapped-2, which carefully selects demonstrations, often performed the best. The framework also proved robust, working well with various LLMs of different sizes.

Beyond just numerical accuracy, the researchers also conducted a behavioral analysis. They found that the agents constructed by their methods genuinely exhibited behaviors similar to the human groups they were designed to represent. For instance, an agent representing students with a strong grasp of mental multiplication would perform similarly on new math problems, and an agent representing a particular political ideology would express opinions consistent with that ideology, even though no explicit demographic data was used in their construction.

This research marks a significant step towards creating more nuanced and representative AI systems that can better reflect the complexity of human populations. While the current work focuses on prompting-based approaches and doesn’t delve into demonstration ordering, it lays crucial groundwork for future applications, such as training teachers with diverse student models or simulating responses to new government policies. You can read the full research paper here: PROMPTOPTIMIZATIONACROSSMULTIPLEAGENTS FORREPRESENTINGDIVERSEHUMANPOPULATIONS.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Beyond Single LLMs: Building AI Ensembles for Human Diversity

How the Agents are Built and Selected

Real-World Applications and Results

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates