Unmasking Hidden Roles: A New AI Framework for Social Deduction Games

TLDR: A new research paper introduces CSP4SDG, a probabilistic constraint-satisfaction framework for identifying hidden roles in Social Deduction Games like Avalon and Mafia. It converts game events and dialogue into four types of constraints (evidence, phenomena, assertions, hypotheses) using a lightweight LLM. Hard constraints prune impossible roles, while weighted soft constraints (using information gain) score the remainder. Experiments on three datasets show CSP4SDG consistently outperforms LLM-only baselines and significantly boosts LLM performance when used as a reasoning tool, demonstrating the power of structured, interpretable, and training-free probabilistic reasoning for complex deduction tasks.

Social Deduction Games (SDGs) like Avalon, Mafia, and Werewolf are incredibly popular, but they present a unique challenge: players must deduce hidden roles while others actively try to mislead them. This task of accurately identifying roles is crucial for both human and AI players, as it forms the foundation for strategic decision-making.

A new research paper introduces a novel framework called CSP4SDG, which stands for Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference. This framework offers a fresh approach to tackling the complex problem of role identification in these games. You can find the full paper here: CSP4SDG Research Paper.

Understanding CSP4SDG’s Approach

At its core, CSP4SDG is a probabilistic, constraint-satisfaction framework designed to analyze gameplay objectively. Instead of relying on heavy training or game-specific rules, it translates game events and player dialogue into four distinct types of constraints:

Evidence: These are hard facts that definitively fix a player’s role or narrow down possibilities. For example, if an Assassin kills Merlin in Avalon, that’s concrete evidence.
Phenomena: Also hard constraints, these narrow down the possible roles for multiple players. An example would be a quest outcome in Avalon indicating a certain number of evil players on a team.
Assertions: These are statements made by players, treated as soft constraints with high importance. If a player claims, “I am Percival,” this assertion strongly influences the role assignments.
Hypotheses: Representing weaker speculations or supports from players, these are soft constraints with lower weights. They provide subtle probabilistic preferences without strictly ruling out possibilities.

The framework uses these constraints in a two-step process. First, hard constraints (evidence and phenomena) are applied to eliminate impossible role assignments, creating a feasible set of possibilities. Then, weighted soft constraints (assertions and hypotheses) are used to score the remaining assignments. A key innovation here is the use of information-gain weighting, which links each hypothesis to its expected value in reducing uncertainty, removing the need for manual tuning of weights.

The Role of Large Language Models (LLMs)

CSP4SDG integrates a lightweight Large Language Model (LLM) into its workflow. This LLM’s primary role is to act as an “information converter.” It takes raw game logs, including chat and events, and transforms them into the structured, language-agnostic constraint types mentioned above. This allows the CSP solver to then apply its logical and probabilistic reasoning.

Experimental Validation and Key Findings

The researchers conducted extensive experiments on three public datasets from popular SDGs: Avalon NLU, Mafia, and AvalonLogs. They compared CSP4SDG against LLM-only baselines and hybrid LLM+CSP approaches, evaluating performance from various player perspectives (objective, good-roles, evil-roles) and under different conditions (truthful vs. deceptive play).

The results were compelling:

Superior Performance: CSP4SDG consistently outperformed LLM-based baselines in every inference scenario. This highlights the power of principled probabilistic reasoning combined with information theory.
LLM Enhancement: When CSP4SDG was supplied as an auxiliary “reasoning tool” to LLMs, it significantly boosted their performance. This suggests that while LLMs are excellent at extracting information, they benefit greatly from structured, logical reasoning for complex deduction tasks.
Interpretability and Scalability: The framework provides fully interpretable results, updating role probabilities in real-time. It’s also training-free and game-agnostic, making it a scalable and flexible solution that can be applied across different social deduction games without extensive re-engineering.

The study also revealed that pure LLMs struggle with the complex combinatorial reasoning required for precise role deduction, often exhibiting hallucinations or contextual confusion. Even stronger LLMs showed only modest improvements, indicating a fundamental limitation in their ability to perform structured reasoning without explicit guidance. Conversely, CSP4SDG’s structured approach provided consistent and significant accuracy uplifts, especially for roles with hidden information.

Also Read:

Conclusion

CSP4SDG represents a significant step forward in AI for social deduction games. By combining a generalized probabilistic constraint-satisfaction framework with LLM-enhanced information extraction, it offers a powerful, interpretable, and scalable alternative or complement to heavy-weight neural models. This research validates that structured reasoning, guided by information theory, is essential for achieving high-fidelity role inference in the dynamic and deceptive world of social deduction games.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unmasking Hidden Roles: A New AI Framework for Social Deduction Games

Understanding CSP4SDG’s Approach

The Role of Large Language Models (LLMs)

Experimental Validation and Key Findings

Conclusion

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates