TLDR: A new study reveals that humans consistently misinterpret the step-by-step reasoning texts generated by AI models, achieving only 29.3% accuracy in identifying causal dependencies between steps. This “universal failure” persists across diverse demographics and even with collective human agreement, challenging the utility of reasoning texts as transparent interpretability tools and suggesting that AI uses language in fundamentally different ways than humans.
A recent study sheds light on a critical challenge in understanding artificial intelligence: humans consistently misinterpret the step-by-step reasoning texts generated by advanced AI models. These “reasoning texts” are often seen as a window into how AI thinks, offering transparency and interpretability. However, new research suggests that our human understanding of these texts often doesn’t align with the AI’s actual computational process.
The paper, titled “Humans Perceive Wrong Narratives from AI Reasoning Texts,” by Mosh Levy, Zohar Elyoseph, and Yoav Goldberg, investigates a fundamental question: can humans accurately identify which steps in an AI’s reasoning text causally influence later steps? The findings reveal a significant and concerning discrepancy.
The Study: Unpacking AI’s Causal Chains
To explore this, the researchers devised a novel “AI narrative test.” They focused on identifying “causal dependencies” – meaning, if removing one step in the AI’s thought process would change a subsequent step. They used a method called Causal Step Intervention Analysis. For each step in an AI-generated reasoning text, they systematically removed preceding steps one by one and observed if the AI’s regenerated target step changed semantically. This allowed them to map out the true causal connections within the AI’s process.
For the human evaluation, 80 participants were presented with various math problems and the AI’s step-by-step reasoning. They were then shown a “target step” and asked to identify, from four preceding options, the single step that, if removed, would cause the target step to change. A helpful “hint” was provided, showing what the target step would look like if the correct causal step was indeed omitted. The questions were carefully designed to be fair, avoiding misleading distractors and ensuring a balanced representation of different AI models (DeepSeek-R1 and Qwen-3) and problem types from the GSM8K dataset.
Startling Results: A Universal Misunderstanding
The results were stark. Participants achieved an average accuracy of only 29.3%, barely above random chance (25%). What’s more, every single one of the 80 participants scored below 50% accuracy, indicating a “universal failure” to correctly infer the AI’s true causal dependencies. This poor performance wasn’t limited to specific groups; factors like a STEM background, education level, or prior AI experience had no significant impact on accuracy. Even spending more time deliberating on questions didn’t lead to better results, suggesting the issue isn’t a lack of effort but a deeper cognitive mismatch.
The study also examined whether collective human agreement could lead to a better understanding. On half of the questions, a substantial majority (50% or more) of participants agreed on the same answer. However, the accuracy of these consensus choices was still only 40%. This suggests that even when humans collectively form a “shared narrative” about how an AI reasons, this narrative often remains incorrect, representing a “shared illusion” rather than true insight.
Interestingly, there was a slight difference between models: participants scored 37.8% accuracy on texts from DeepSeek-R1 compared to 20.9% on Qwen-3. While DeepSeek-R1 appeared somewhat more interpretable, the fundamental gap in human understanding persisted across both architectures.
Also Read:
- The Paradox of AI Reasoning: When Better Performance Means Less Human Understanding
- AI’s “Chain of Thought” Reasoning Deemed a “Brittle Mirage” by Researchers
Implications: Rethinking AI Transparency
These findings have profound implications for how we approach AI interpretability and human-AI collaboration. Firstly, they challenge the notion that AI’s reasoning texts can be taken at face value as transparent explanations. The linguistic outputs, even those with metacognitive expressions like “Wait, let me break this down,” do not reliably reflect the model’s actual internal process when simply read by humans. The researchers argue that reasoning texts should be treated as “computational artifacts to be systematically investigated,” rather than straightforward explanations.
Secondly, the study suggests that AI models operate on language in a fundamentally different way than humans. They are described as “new kinds of language-users” that wield natural language effectively, but under vastly different rules than our own. This calls for a reevaluation of language as the primary medium for human-AI communication, urging further research into how AI comprehends human language, and vice-versa.
The full research paper can be accessed here: Humans Perceive Wrong Narratives from AI Reasoning Texts.


