spot_img
HomeResearch & DevelopmentUnpacking AI's Grasp of Human Reasoning Styles in Social...

Unpacking AI’s Grasp of Human Reasoning Styles in Social Games

TLDR: The InMind framework evaluates how well large language models (LLMs) can understand and apply individual human reasoning styles, particularly in social deduction games like Avalon. Using detailed gameplay annotations, InMind assesses LLMs on tasks like identifying players, aligning reflections, attributing reasoning traces, and inferring roles. Findings show that while some advanced LLMs exhibit early style-sensitive reasoning, most struggle with dynamic adaptation and grounding their logic in the evolving game context, often relying on superficial cues.

Large Language Models (LLMs) have demonstrated impressive capabilities in various complex tasks, from scientific reasoning to understanding human intentions. However, a critical area often overlooked in their evaluation is their ability to capture and apply the unique, individualized reasoning styles that shape how people interact and make decisions in social settings.

A new research paper introduces InMind, a groundbreaking, cognitively-grounded evaluation framework designed to address this gap. The framework aims to assess whether LLMs can truly internalize and adapt to personalized human reasoning, especially in dynamic and interactive environments.

The Challenge of Individualized Reasoning

Traditional LLM benchmarks often focus on output plausibility or behavioral consistency, providing limited insight into the underlying cognitive mechanisms. In real-world social scenarios, people don’t just arrive at conclusions; they do so through distinct, context-sensitive reasoning trajectories. This individual variation is what the researchers refer to as an ‘individualized reasoning style’.

To effectively evaluate this, the InMind framework leverages Social Deduction Games (SDGs) like Avalon. These games are ideal because they are dynamic, adversarial, and inherently individualized, requiring players to infer hidden mental states and make strategic decisions based on evolving information. Simply producing plausible outputs isn’t enough; an LLM must capture and adapt to a player’s unique style for meaningful human-AI collaboration.

How InMind Works: A Dual-Layer Approach

InMind introduces two complementary gameplay modes: Observer and Participant. In Observer mode, a human subject passively reasons from another player’s perspective without taking action, helping to isolate cognitive patterns from overt behavior. In Participant mode, the subject actively engages in the game, providing annotations from their own viewpoint.

Crucially, InMind integrates dual-layer cognitive annotations:

  • Strategy Traces: These capture real-time reasoning signals, such as belief updates, intention inferences, and counterfactual thinking, as the game unfolds.
  • Reflective Summaries: These offer post-game insights, contextualizing key events and evaluating other players’ behaviors and intentions in hindsight.

These rich annotations enable InMind to define four cognitively motivated tasks that jointly evaluate both static alignment and dynamic adaptation of LLMs:

  1. Player Identification: Tests if an LLM can recognize behavioral patterns consistent with a specific reasoning style.
  2. Reflection Alignment: Assesses the model’s ability to ground abstract post-game reflections in concrete gameplay behavior.
  3. Trace Attribution: Probes whether the model can simulate evolving, in-context reasoning across time.
  4. Role Inference: Evaluates if the model can internalize reasoning styles to support belief modeling under uncertainty.

The InMind-Avalon Case Study and Key Findings

The researchers instantiated InMind within the popular six-player social deduction game Avalon, creating the InMind-Avalon dataset. This novel dataset comprises 30 full-session human gameplays, meticulously annotated with detailed cognitive traces and reflective summaries. The game sessions were conducted via online voice chat in Mandarin Chinese, capturing authentic communication dynamics and game-specific expressions.

An extensive evaluation of 11 state-of-the-art LLMs on InMind-Avalon revealed several critical limitations:

  • Most models, including advanced ones like GPT-4o, heavily rely on superficial lexical patterns, struggling to infer deeper strategic intent.
  • Temporal alignment between reflective reasoning and specific in-game events remains a significant challenge for nearly all evaluated models.
  • Dynamic adaptation of strategic reasoning based on evolving interactions is largely insufficient, indicating fundamental shortcomings in LLMs’ capacity for individualized reasoning over time.

However, the study also observed promising potential in certain reasoning-enhanced models, such as DeepSeek-R1, which exhibited early signs of style-sensitive reasoning. These models were better at extracting abstract reasoning traits beyond surface-level linguistic cues.

The findings underscore that while LLMs excel in many areas, their capacity for individualized, adaptive reasoning in complex social environments is still limited. The InMind framework and its accompanying dataset provide a principled tool to guide future advancements toward more personalized and socially aware AI systems. For more details, you can read the full research paper here.

Also Read:

Future Directions

The researchers plan to expand InMind to include other social deduction games with different social structures and interaction patterns, such as Blood on the Clocktower and Werewolf. They also aim to broaden the framework’s application beyond games to domains like multi-agent collaboration, negotiation, and human-AI teaming, where personalized, context-sensitive reasoning is crucial for effective interaction.

Rhea Bhattacharya
Rhea Bhattacharyahttps://blogs.edgentiq.com
Rhea Bhattacharya is an AI correspondent with a keen eye for cultural, social, and ethical trends in Generative AI. With a background in sociology and digital ethics, she delivers high-context stories that explore the intersection of AI with everyday lives, governance, and global equity. Her news coverage is analytical, human-centric, and always ahead of the curve. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -