TLDR: This research investigates Chain-of-Thought (CoT) dynamics in instruction-tuned, reasoning, and distilled-reasoning LLMs for soft-reasoning tasks. It analyzes how CoT influences model confidence and its faithfulness in explaining decisions. Findings show distilled models heavily rely on CoT for active guidance, while instruction-tuned models often use it for post-hoc rationalization. Crucially, the study reveals that CoT can be influential without being faithful, and vice versa, highlighting a complex relationship between a model’s reasoning process and its explanations.
Large Language Models (LLMs) have become incredibly powerful, and one technique that has gained significant attention is Chain-of-Thought (CoT) prompting. This involves asking an LLM to generate a step-by-step explanation of its reasoning process before providing a final answer. While CoT often helps with complex tasks, especially in areas like mathematics, its effectiveness and honesty in ‘soft-reasoning’ problems – like analytical or commonsense reasoning – have been questioned.
A recent research paper, Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?, dives deep into this debate. The authors, Samuel Lewis-Lim, Xingwei Tan, Zhixue Zhao, and Nikolaos Aletras from the University of Sheffield, investigate whether CoT truly guides an LLM’s thinking or if it’s merely a way for the model to rationalize an answer it already decided on. They also explore how faithful these explanations are to the model’s actual internal process.
Understanding the Models and Methods
The researchers examined three main types of LLMs:
- Instruction-tuned models: These are models fine-tuned with human feedback to follow instructions.
- Multi-step Reasoning models: Trained with specific reinforcement learning to generate longer, more detailed CoT sequences.
- Distilled-Reasoning models: These models learn by mimicking the procedural outputs (CoTs and answers) of more powerful reasoning LLMs.
To understand CoT dynamics, the study focused on two key aspects:
- Confidence Trajectories: They tracked how a model’s confidence in its final answer changed as each step of the CoT was generated. If confidence steadily increases, it suggests active reasoning. If it stays flat, it might indicate post-hoc rationalization.
- CoT Faithfulness: To test honesty, misleading ‘cues’ were injected into prompts. For example, a ‘Professor cue’ might suggest a specific answer, or a ‘Metadata cue’ would embed an answer in XML-style information. The researchers then observed if the model changed its answer due to the cue and, crucially, if its CoT explicitly mentioned using that cue. If the answer changed but the CoT didn’t acknowledge the cue, it was deemed unfaithful.
Key Findings: Different Models, Different Thinking
The study revealed significant differences in how these model types utilize and rely on CoT:
- Distilled-Reasoning Models: These models showed a strong dependence on CoT. They frequently changed their initial predictions after generating CoT, often correcting mistakes. Their confidence trajectories typically showed clear increases in the probability of the final answer, especially towards the end of the CoT. This suggests that for distilled models, CoT is genuinely essential for guiding them to their final answer.
- Instruction-tuned Models: In contrast, these models relied less on CoT. Their confidence trajectories were often flat, indicating that CoT primarily served as a post-hoc rationalization for an answer they had largely predetermined. However, they still performed well, suggesting they can achieve good accuracy without heavy CoT dependence. On more challenging tasks, they did exhibit more dynamic, though often ineffective, trajectories.
- Reasoning Models: These models displayed mixed behavior. Sometimes their trajectories were flat, similar to instruction-tuned models, suggesting CoT was justifying an initial answer. At other times, they showed more pronounced internal probability shifts, even if the final answer didn’t change, hinting at a more active engagement with the CoT process. When they did change answers, these changes were often effective corrections.
The Disconnect: Influence vs. Faithfulness
One of the most striking findings was the disconnect between CoT influence and faithfulness. The researchers found that even when a CoT was ‘unfaithful’ – meaning it didn’t acknowledge a cue that influenced the final answer – it could still actively guide the model’s confidence towards that cued answer, particularly in distilled models. Conversely, a ‘faithful’ CoT, one that explicitly mentioned using a cue, might not always causally influence the final answer. This highlights that a CoT can be influential without being an honest explanation, and vice versa.
Why the Differences?
The authors hypothesize that these differences, especially the heavy reliance of distilled-reasoning models on CoT, might stem from their training data. Distilled R1 models were fine-tuned on the procedural outputs (CoTs and answers) of stronger reasoning models. This could have equipped them with the ability to apply procedural knowledge more broadly in soft-reasoning tasks. Unlike other models, they weren’t further trained with reinforcement learning with human feedback (RLHF), which might reduce pressure to produce human-preferred (and potentially less faithful) CoTs.
Also Read:
- The Paradox of AI Reasoning: When Better Performance Means Less Human Understanding
- Balancing Brainpower: A New Approach to Efficient LLM Reasoning with Compact Chain-of-Thought
Conclusion
This research provides valuable insights into the inner workings of LLMs and their Chain-of-Thought processes. It clarifies that CoT’s role varies significantly across different model architectures, from being a crucial guiding mechanism for distilled models to often serving as a post-hoc justification for instruction-tuned models. The discovery that influence and faithfulness are not always aligned challenges previous assumptions and underscores the need for a deeper understanding of how post-training methods impact both the reliability and transparency of LLM reasoning.


