TLDR: TactfulToM is a new benchmark evaluating LLMs’ ability to understand white lies in real-life conversations. It uses a human-in-the-loop pipeline to create diverse scenarios with information asymmetry and a hierarchical evaluation framework. Findings show state-of-the-art LLMs significantly underperform humans, struggling to grasp prosocial motivations behind white lies and apply mental state tracking in complex social contexts, often relying on surface-level patterns instead of genuine understanding.
Large Language Models (LLMs) have shown impressive capabilities in various complex tasks, from solving mathematical problems to writing code. However, when it comes to understanding the subtle nuances of human social interaction, particularly something as intricate as a “white lie,” these advanced AI systems still face significant hurdles. A new research paper titled “TactfulToM: Do LLMs Have the Theory of Mind Ability to Understand White Lies?” delves into this very challenge, introducing a novel benchmark to rigorously test LLMs’ comprehension of these prosocial deceptions.
Theory of Mind (ToM) is the human cognitive ability to understand that others have their own mental states—beliefs, desires, intentions, and emotions—that may differ from our own. This skill is fundamental for effective social interactions and is considered a cornerstone of common sense reasoning. While LLMs have made strides in general reasoning, their performance on ToM tasks, especially in realistic social scenarios, has consistently lagged behind human capabilities.
Understanding white lies is a particularly complex aspect of ToM. These are intentional falsehoods told specifically to protect someone’s feelings or maintain social harmony, rather than for malicious intent. The ability to detect such lies and grasp their emotional motivations is crucial for developing AI tools that can operate safely and appropriately in sensitive domains like education, healthcare, and caregiving. Despite this importance, white lies have been largely overlooked in LLM research, with existing benchmarks offering only limited, non-conversational samples.
To bridge this research gap, Yiwei Liu, Emma Jane Pretty, Jiahao Huang, and Saku Sugawara introduce TactfulToM, an English benchmark designed to evaluate LLMs’ ability to understand and reason about white lies within real-life conversational contexts. The benchmark focuses on the interplay between deceptive statements and their underlying prosocial motivations. You can read the full research paper here: TactfulToM Research Paper.
How TactfulToM Works
The creation of TactfulToM involved a meticulous multi-stage human-in-the-loop pipeline. The researchers first decomposed white lies into a “triplet” of elements: the Real Reason (the motivation behind the lie), the Lie (the false statement itself), and the Truth (the actual reality that diverges from the lie). This systematic decomposition allowed for the manual crafting of seed stories.
These seed stories were then expanded into high-quality, multi-party conversations using LLMs like GPT-4o, but with strict human validation at each step. This “human-in-the-loop” approach was critical to avoid biases that might arise if LLMs were left to generate white lie scenarios based on their own potentially flawed understanding.
A key design feature is “role-based information asymmetry.” Four roles are defined: the Liar (L), Accomplice (A), Observer (O), and Target (T). Each role has varying access to the white lie triplet, ensuring that participants have different levels of comprehension. For instance, the Target only receives the lie, while the Liar knows the full truth and motivation. This asymmetry is vital for testing an LLM’s ability to track distinct mental states.
The benchmark also incorporates three levels of “truth accessibility” to vary difficulty: Level-1 provides a falsifiable truth, Level-2 offers a non-falsifiable truth, and Level-3 provides no explicit truth, forcing models to infer deception from context and motivation alone.
Evaluating White Lie Understanding
TactfulToM employs a hierarchical evaluation framework with three tiers of questions:
- Info-State Questions: These assess basic mental state tracking, including factual questions, belief attribution (first and second-order beliefs), information accessibility, and answerability.
- White Lie Understanding (1st-Order): Drawing from established psychological tests, these questions evaluate if models can identify false statements (Comprehension) and recognize the prosocial motivations behind them (Justification).
- White Lie Reasoning (2nd-Order): Novel question types like “Lie Ability” (identifying who can tell a white lie) and “Lie Detectability” (identifying who can recognize deception) test deeper reasoning about characters’ perspectives and information access.
Key Findings and LLM Limitations
The evaluation of nine state-of-the-art LLMs from four different families (GPT, DeepSeek, Llama, Qwen) revealed significant gaps compared to human performance. Humans achieved over 85% accuracy across all tasks, while even the best-performing LLMs (DeepSeek families and GPT-4o) fell substantially short.
One striking finding is that LLMs struggle with “true white lie understanding.” While they might correctly identify a statement as false, they often fail to grasp the underlying prosocial motivation. Performance drops significantly when models need to combine both aspects, suggesting they might succeed on individual dimensions through pattern matching rather than genuine integration of belief and emotional understanding.
The research also showed that LLMs can track mental states reasonably well in isolation but struggle to apply this knowledge effectively in white lie contexts, particularly in lie detectability tasks. This indicates a disconnect between basic belief tracking and the ability to use these representations for complex social reasoning.
Interestingly, models performed better on white lie scenarios involving “common sense falsehoods” (e.g., “Santa is real” or symbolic explanations of sensitive topics). This suggests that LLMs often rely on statistical regularities and pre-existing common sense knowledge as shortcuts, rather than engaging in deep, situation-specific contextual reasoning when faced with more nuanced white lies.
Furthermore, the study highlighted that models excel at detecting lies when explicit contradictions (Level-1 truth accessibility) are present. However, their performance significantly declines when they need to infer deception directly from motivations without a clear truth provided (Level-3). This underscores their difficulty in understanding protective intentions without explicit factual cues.
Also Read:
- Understanding LLM Performance Decay in Professional Training Simulations
- DivLogicEval: A New Benchmark Uncovers Gaps in LLM Logical Reasoning
Ethical Considerations
The researchers emphasize that TactfulToM is not designed to advocate for AI systems that can tell white lies. Instead, it aims to systematically evaluate LLMs’ social reasoning capabilities. The findings demonstrate that current models are far from human-like understanding in these scenarios, primarily relying on pattern matching rather than genuine comprehension of mental states or intentions. This raises important ethical questions about the development of LLMs: should they merely interpret human behavior, or potentially generate prosocial deceptions themselves? The benchmark provides a foundation for improving LLMs’ social reasoning, while also prompting careful consideration of the implications of aligning AI with all aspects of human social behavior, including benign deception.


