LLMs in Social Simulation: How AI Agents Mimic Human Group Dynamics

TLDR: A new study investigates how Large Language Model (LLM)-based multi-agent simulations can replicate human social dynamics like conformity, group polarization, and fragmentation in online forum settings. Researchers found that smaller LLMs tend to conform more, while models optimized for reasoning are more resistant to social influence and maintain diverse opinions. The findings suggest that model choice should align with the desired simulation outcome, whether it’s observing consensus or persistent dissent.

Recent advancements in Large Language Models (LLMs) are opening new doors for understanding complex human social interactions. A new research paper explores whether multi-agent simulations powered by LLMs can accurately reproduce core human social dynamics observed in online forums, such as conformity, group polarization, and fragmentation.

The study, titled “Towards Simulating Social Influence Dynamics with LLM-based Multi-agents,” was conducted by researchers from the Department of Information Management at National Sun Yat-Sen University in Kaohsiung, Taiwan. The team included Hsien-Tsung Lin, Chan Hsu, Pei-Cing Huang, Pei-Xuan Shieh, Chan-Tung Ku, and Yihuang Kang. Their work investigates how different LLM scales and reasoning capabilities influence these social phenomena within a structured simulation framework.

Simulating Social Dynamics with AI Agents

The researchers designed a robust multi-agent conversational environment that mimics the asynchronous interaction patterns typical of Bulletin Board Systems (BBS) forums. In this setup, a central manager orchestrates message exchanges in a round-robin fashion, with each agent posting in sequence and all messages broadcast to every participant. Each agent was given a structured persona, including demographic attributes, communication style, and a fixed initial stance on a controversial topic, like whether governments should adopt stringent environmental policies. The interactions proceeded through five rounds of posting, allowing agents to reference and respond to previous messages, simulating a live forum thread.

To ensure the reliability of their findings, each simulation setting was repeated 25 times, and the results were aggregated to observe overall patterns in conformity rates, polarization changes, and fragmentation. The study focused on three key social phenomena:

Conformity: How individuals adjust their opinions to align with the majority view. In the simulation, a “conforming stance change” occurred when an agent’s shift in position brought it closer to the prevailing group stance.
Group Polarization: The tendency for initial moderate positions to become more extreme over time through interaction. Agent stances were tracked on a five-point scale from “Strongly Oppose” to “Strongly Support.”
Group Fragmentation: When participants split into distinct subgroups holding fundamentally opposing positions rather than converging on a consensus. This was measured by the balance between agents supporting and opposing the proposition.

Key Findings on Model Behavior

The experiments categorized LLMs into four groups based on their parameter scales, computational requirements, and reasoning features:

Group A (Smaller Models): Operable on a single GPU, balancing accessibility with linguistic competence (e.g., Qwen2.5-7B, Llama3.1-7B).
Group B (Mid-sized Models): Higher capacity but still feasible for limited computing resources (e.g., Qwen2.5-72b, Llama3.1-70B).
Group C (Proprietary LLMs): Widely adopted models like GPT-4o, Claude 3.5 Haiku, and Gemini Flash 2.0.
Group D (Reasoning-Oriented Models): Architectures explicitly designed or fine-tuned for logical inference and reasoning (e.g., GPT-o1-mini, Deepseek-R1).

The findings revealed interesting patterns in social alignment. Models in Groups A, B, and C generally showed moderate responsiveness to peer influence, with conformity rates typically between 10-20%. Notably, ChatGPT-4o in Group C exhibited the highest conformity rate at 19.45%, suggesting that some larger generative models might be more susceptible to majority alignment.

In stark contrast, reasoning-oriented models in Group D displayed significantly lower conformity rates, with ChatGPT-o1-mini showing just 3.13%. This indicates that models optimized for reasoning have a stronger capacity to maintain their initial viewpoints under social pressure, likely due to more consistent internal reasoning processes.

Regarding stance evolution, Groups A and B showed higher polarization changes and lower fragmentation, suggesting they were more open to external influence and tended to converge towards “support” or “strongly support” stances. This implies that smaller or mid-sized models with limited reasoning capabilities might lean towards consensus. However, some models like Qwen2.5-72b and Qwen2.5-7b within these groups still showed notable fragmentation, indicating their ability to preserve dissent under certain conditions.

Group C models exhibited the lowest overall polarization change, suggesting stronger resilience against extreme stance shifts. While ChatGPT-4o in this group showed low fragmentation, indicating a convergence towards supportive stances, the advanced architectures generally maintained more consistent viewpoints.

Finally, Group D, the reasoning-focused models, consistently maintained a subset of agents in the “strongly oppose” category. This highlights that these models can hold firm adversarial stances even when the broader conversation trends are supportive. Fragmentation was also prominent in Group D, with a clear split between “strongly support” and “strongly oppose,” demonstrating that logic-centric designs retain diverse opinions and allow dissenting views to persist alongside majority positions.

Also Read:

Implications for AI and Social Science

The research demonstrates that LLM-based multi-agent simulations can effectively reproduce social phenomena like moderate conformity, group polarization, and persistent dissent. The stability and fragmentation observed in reasoning-focused models suggest their suitability for applications requiring stance durability or viewpoint heterogeneity, such as AI agents designed for deliberation or argumentation.

Conversely, mid-sized or large generative models appear more prone to aligning with the majority, especially when repeated interactions foster a perceived consensus. This implies that researchers aiming to simulate extreme stance shifts or group consensus might prefer LLMs with simpler generative capacities, while those studying persistent dissent or strongly defended positions might opt for more reasoning-focused LLMs.

In essence, the choice of LLM for social simulations should align with the specific research goal: whether to observe realistic opinion shifts and consensus formation or to maintain heterogeneity and allow contrarian stances to flourish. This study provides valuable insights for both computational social science and the development of agentic AI. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

LLMs in Social Simulation: How AI Agents Mimic Human Group Dynamics

Simulating Social Dynamics with AI Agents

Key Findings on Model Behavior

Implications for AI and Social Science

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vida Secures $4 Million Series A Funding to Advance AI Voice Technology and Expand Leadership

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates