Evaluating LLM Behavior in Dynamic Economic Tasks

TLDR: A study introduces a process-oriented framework to assess how Large Language Models (LLMs) simulate human decision-making, focusing on variability and adaptability rather than just optimal outcomes. Using second-price auctions and newsvendor problems, the research found that LLMs inherently adopt stable, conservative strategies that differ significantly from the noisy and diverse patterns of human behavior. While interventions like risk-framed instructions and in-context learning with human data can nudge LLMs towards more human-like responses, they do not fully replicate the full spectrum of human strategic variability. This highlights a crucial gap in LLM behavioral fidelity for social science simulations.

Large language models (LLMs) are increasingly being used to simulate human subjects in various social science studies, from psychology to economics. While these models have shown impressive capabilities in reasoning and optimization, a critical question remains: can they accurately mimic the nuanced variability and adaptability that characterize human decision-making?

A recent research paper, titled “Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making,” delves into this very question. The authors, Yuanjun Feng, Vivek Choudhary, and Yash Raj Shrestha, propose a novel framework to evaluate how LLM agents adapt under different levels of external guidance and human-derived “noise.”

Understanding the Evaluation Framework

The study introduces a process-oriented evaluation framework with three progressive interventions:

Intrinsicality: In this baseline condition, LLMs operate without any specific guidance, much like human subjects would in a standard experiment.
Instruction: Here, LLMs receive additional instructions that frame the task around specific risk preferences, such as being risk-seeking or risk-averse.
Imitation: This intervention involves providing LLMs with partial histories of human decisions, essentially asking them to learn from and continue human-like behavior.

This framework allows researchers to assess how LLMs exhibit key features of human decision-making, such as bounded rationality (making suboptimal choices under constraints) and behavioral variance (individual differences in decisions).

The Economic Experiments

To validate their framework, the researchers applied it to two classic behavioral economics tasks:

Second-Price Auction: In this primary experiment, LLM agents act as sellers setting a reserve price over 60 rounds. Their profit depends on simulated bidder valuations. This task involves strategic reasoning and a discontinuous payoff function.
Newsvendor Problem: As a supplementary experiment, LLM agents act as vendors deciding how many newspapers to order before knowing the actual demand. This task features a continuous payoff structure and requires optimization under uncertainty.

The study used state-of-the-art LLMs, including GPT-4o, Claude 3.5 Sonnet, and Claude 3.7 Sonnet, instantiating 40 agents for each model with unique demographic profiles to compare their behaviors against human subjects.

Key Findings: A Divergence from Human Behavior

The research revealed several significant insights into LLM decision-making:

Default Behavior (Intrinsicality): By default, LLM agents tend to converge on stable, conservative strategies. While they often achieve similar or even higher profits than humans, their decision patterns show significantly less variability. For instance, in the auction task, LLMs set lower and more constrained reserve prices compared to humans, who exhibit a wider range of pricing strategies.
Impact of Instructions: When given risk-framed instructions, LLMs predictably adjust their behavior. Risk-seeking instructions led to higher reserve prices, while risk-averse instructions resulted in lower ones. However, even with these instructions, LLMs still exhibited less behavioral diversity than human subjects, producing tightly clustered and coherent distributions.
Learning from Imitation: Providing LLMs with human decision histories through in-context learning did help narrow the behavioral gap. Direct imitation, where LLMs were tasked to replicate human patterns, recovered much of the dispersion seen in human decisions. Yet, even under the best imitation conditions, LLMs did not fully reproduce the full extent of human strategic variability and noise.

These findings highlight a persistent “alignment gap” in behavioral fidelity. LLMs, trained to minimize predictive loss and often using low-temperature sampling, tend to produce high-probability, low-variance outputs, which differs from the often noisy, history-dependent, and adaptable nature of human decisions.

Also Read:

Implications for Social Science Research

The study suggests that while LLMs can be powerful tools for simulating human subjects, their inherent tendencies mean that researchers must conduct thorough behavioral audits alongside experimental results. Acknowledging and contextualizing the gaps in variability is crucial for assessing the credibility of LLMs as substitutes for human decision-makers, especially in synthetic data generation for social science where variability is a key signal.

This process-oriented framework offers valuable guidance for auditing LLM behavior in dynamic decision-making tasks, paving the way for future model development that better balances optimization with the essential stochasticity of human behavior. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Evaluating LLM Behavior in Dynamic Economic Tasks

Understanding the Evaluation Framework

The Economic Experiments

Key Findings: A Divergence from Human Behavior

Implications for Social Science Research

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates