TLDR: Large Reasoning Models (LRMs) generate answers using both Chain-of-Thought (CoT) reasoning and direct memory retrieval, which can sometimes lead to inconsistencies. A new study investigates this interplay, finding that factors like problem domain, model size, and training methods influence which mechanism dominates. The research introduces FARL (Forgetting-Augmented Reinforcement Learning), a novel fine-tuning framework that suppresses memory retrieval shortcuts, thereby enhancing genuine reasoning capabilities and improving model robustness and generalization.
Large Reasoning Models (LRMs), such as those in the GPT o-series and Gemini, have shown remarkable abilities in solving complex problems through what’s known as Chain-of-Thought (CoT) reasoning. This means they can “show their work” by generating step-by-step explanations before giving a final answer, which helps users understand and trust their outputs. However, recent observations have highlighted a puzzling issue: sometimes, the final answers provided by these models don’t logically follow their own reasoning steps.
Researchers from Stony Brook University hypothesized that this inconsistency arises because LRMs use two competing mechanisms to generate answers: deliberate reasoning (CoT) and direct retrieval from their internal memory. To investigate this, they conducted controlled experiments where they challenged LRMs with misleading information during the reasoning process or corrupted answers in the models’ memory.
Uncovering the Dual Mechanisms
The study confirmed that both reasoning and retrieval mechanisms operate simultaneously when LRMs generate answers. By introducing perturbations – either misleading cues into the CoT or by “poisoning” the model’s memory with incorrect answers – the researchers observed how the models’ final answers changed. When both reasoning and retrieval cues pointed to the same incorrect answer, the effect was amplified, suggesting that the model’s confidence in an answer increases when both pathways agree. Conversely, when the cues pointed to different incorrect answers, a “tug-of-war” phenomenon occurred, with the final answer gravitating towards one pathway or the other.
Factors Influencing Dominance
The research identified several key factors that influence whether reasoning or retrieval dominates:
- Problem Domains: In domains like mathematics and logic, reasoning tends to be stronger. Models were less susceptible to memory poisoning and showed greater confidence in their original CoT reasoning, likely because mathematical problems have a structured, verifiable nature.
- Model Scales: Larger models generally exhibited stronger reasoning dominance. They were more resistant to misleading information in both memory and CoT, and less likely to fabricate justifications for incorrect answers. This suggests that larger models generalize reasoning principles better rather than relying on memorized facts.
- Fine-tuning Approaches: The way a model is trained plays a significant role. Models trained with Reinforcement Learning (RL) showed stronger reasoning dominance. In contrast, models fine-tuned through distillation (learning from a teacher model) were more prone to retrieval-based responses and often engaged in “post-hoc explanation” – fabricating rationales to justify memorized answers.
- Attention Patterns: By analyzing the internal activations of the LRMs, the researchers found that specific attention heads in the middle layers of the network act as a critical control point, arbitrating between following generated reasoning traces and deferring to retrieved answers.
Also Read:
- Enhancing LLM Logical Reasoning Through Data Complexity Quantification
- Unlocking New Abilities: How Reinforcement Learning Helps Language Models Compose Skills
Introducing FARL: Forgetting-Augmented Reinforcement Learning
Based on these insights, the researchers introduced a novel fine-tuning framework called FARL (Forgetting-Augmented Reinforcement Learning). The core idea behind FARL is to actively suppress retrieval shortcuts during RL training. By compelling the model to “forget” specific memorized answers, FARL forces the model to rely more on its genuine reasoning capabilities, thereby purifying the reward signal and enhancing reasoning development.
FARL demonstrated significant improvements. It reduced the influence of both reasoning and retrieval perturbations, indicating stronger reasoning-dominant behavior and enhanced CoT robustness. It also achieved higher accuracy improvements both within the training domain and on out-of-domain tasks compared to standard RL and supervised fine-tuning (SFT). Furthermore, FARL improved the quality of the generated CoTs, leading to more efficient and integrated reasoning trajectories.
This study offers a new perspective on how Large Reasoning Models generate answers, highlighting the interplay between deliberate reasoning and direct retrieval. The introduction of FARL provides a promising direction for more effectively eliciting and strengthening genuine reasoning abilities in LRMs. For more details, you can read the full research paper here.


