LLMs Enter the Social Arena: A Mini-Mafia Benchmark for Deception and Disclosure

TLDR: The research paper introduces ‘Mini-Mafia’, a simplified four-player social deduction game, as a benchmark to evaluate Large Language Models’ (LLMs) social intelligence. It specifically measures their abilities in deception (as a mafioso), deception detection (as a villager), and strategic information disclosure (as a detective). The study found that smaller, more cost-effective LLMs sometimes surprisingly outperformed larger ones in these social tasks. It also uncovered emergent multi-agent dynamics like name bias and a ‘last-speaker advantage’. The benchmark is crucial for understanding LLM social capabilities and contributes to AI safety by generating data for deception detection.

Large Language Models (LLMs) are increasingly being used in complex situations where they interact with multiple other agents. In these scenarios, their success often depends on their ‘social intelligence’ – abilities like understanding others’ intentions (theory-of-mind), acting with incomplete information, and dealing with agents who have different goals. However, systematically testing these social capabilities has been a challenge, as most existing evaluations focus on single-agent tasks.

To address this, researchers Davi Bastos Costa and Renato Vicente have introduced ‘Mini-Mafia’, a simplified version of the classic social deduction game, Mafia. This game serves as a controlled environment to evaluate how LLMs perform in adversarial multi-agent settings. The full research paper can be found here: DECEIVE, DETECT, AND DISCLOSE: LARGE LANGUAGE MODELS PLAY MINI-MAFIA.

What is Mini-Mafia?

Mini-Mafia is a four-player variant of Mafia, featuring one mafioso, one detective, and two villagers. The game is streamlined to focus on a single ‘day phase’ of discussion and voting. During the ‘night phase’, the mafioso eliminates a villager, and the detective investigates the mafioso, learning their identity. This setup creates a crucial information asymmetry: the mafioso has partial information, the villager has no information, and the detective has complete information.

This design specifically isolates three key interactive capabilities:

Deception: The mafioso must successfully mislead the other players.
Deception Detection: The villagers must identify the mafioso.
Information Disclosure: The detective must effectively share their findings to convince the town.

The Mini-Mafia Benchmark

To measure these skills, the researchers developed the ‘Mini-Mafia Benchmark’. This framework involves LLMs playing against each other in systematic tournaments. The core idea is ‘backgrounds’ – fixed pairings of models in two roles (e.g., detective and villager) to create a consistent environment for testing a third model’s capability (e.g., the mafioso’s deception skill). The benchmark estimates win rates within these configurations and then aggregates performance using standardized scoring. Importantly, it’s built entirely from model interactions, meaning it doesn’t require external training data and evolves as new models are introduced.

Surprising Results

The initial experiments yielded some counterintuitive findings. For instance, smaller, more cost-effective models sometimes significantly outperformed their larger, more advanced counterparts. Grok 3 Mini emerged as the best ‘detector’ (villager), and GPT-5 Mini (with minimal reasoning) was the best ‘discloser’ (detective). Both of these models outperformed DeepSeek V3.1, Claude Opus 4.1, and Claude Sonnet 4. Surprisingly, Claude Sonnet 4 was the worst detector, performing similarly to random voting.

Emergent Multi-Agent Dynamics

Beyond just benchmarking, Mini-Mafia also revealed interesting multi-agent phenomena:

Name Bias: The study observed a systematic name bias in LLM trust attribution. For example, players named Bob had a higher win rate than those named Diana, suggesting subtle biases embedded in the language models.
Last-Speaker Advantage: Both mafiosos and detectives showed a significant advantage when they had the last word in discussions, influencing the voting outcome.

Also Read:

Implications for AI Safety

The research also has important implications for AI safety. By tracking models’ deception capabilities and their ability to detect deception, Mini-Mafia can serve as an early warning system. If LLMs begin to match human deception skills while surpassing human detection abilities, this asymmetry could pose significant risks. The framework can also generate valuable training data for developing deception-detection systems, potentially leading to more truthful AI systems.

In conclusion, Mini-Mafia provides a valuable and scalable benchmark for evaluating the social intelligence of LLMs, highlighting that these capabilities are distinct from traditional cognitive abilities and often do not simply scale with model size. This underscores the need for specialized tools to assess the nuanced social interactions of advanced AI.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

LLMs Enter the Social Arena: A Mini-Mafia Benchmark for Deception and Disclosure

What is Mini-Mafia?

The Mini-Mafia Benchmark

Surprising Results

Emergent Multi-Agent Dynamics

Implications for AI Safety

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates