Unraveling How Large Reasoning Models Arrive at Answers

TLDR: Large Reasoning Models (LRMs) generate answers using both Chain-of-Thought (CoT) reasoning and direct memory retrieval, which can sometimes lead to inconsistencies. A new study investigates this interplay, finding that factors like problem domain, model size, and training methods influence which mechanism dominates. The research introduces FARL (Forgetting-Augmented Reinforcement Learning), a novel fine-tuning framework that suppresses memory retrieval shortcuts, thereby enhancing genuine reasoning capabilities and improving model robustness and generalization.

Large Reasoning Models (LRMs), such as those in the GPT o-series and Gemini, have shown remarkable abilities in solving complex problems through what’s known as Chain-of-Thought (CoT) reasoning. This means they can “show their work” by generating step-by-step explanations before giving a final answer, which helps users understand and trust their outputs. However, recent observations have highlighted a puzzling issue: sometimes, the final answers provided by these models don’t logically follow their own reasoning steps.

Researchers from Stony Brook University hypothesized that this inconsistency arises because LRMs use two competing mechanisms to generate answers: deliberate reasoning (CoT) and direct retrieval from their internal memory. To investigate this, they conducted controlled experiments where they challenged LRMs with misleading information during the reasoning process or corrupted answers in the models’ memory.

Uncovering the Dual Mechanisms

The study confirmed that both reasoning and retrieval mechanisms operate simultaneously when LRMs generate answers. By introducing perturbations – either misleading cues into the CoT or by “poisoning” the model’s memory with incorrect answers – the researchers observed how the models’ final answers changed. When both reasoning and retrieval cues pointed to the same incorrect answer, the effect was amplified, suggesting that the model’s confidence in an answer increases when both pathways agree. Conversely, when the cues pointed to different incorrect answers, a “tug-of-war” phenomenon occurred, with the final answer gravitating towards one pathway or the other.

Factors Influencing Dominance

The research identified several key factors that influence whether reasoning or retrieval dominates:

Problem Domains: In domains like mathematics and logic, reasoning tends to be stronger. Models were less susceptible to memory poisoning and showed greater confidence in their original CoT reasoning, likely because mathematical problems have a structured, verifiable nature.
Model Scales: Larger models generally exhibited stronger reasoning dominance. They were more resistant to misleading information in both memory and CoT, and less likely to fabricate justifications for incorrect answers. This suggests that larger models generalize reasoning principles better rather than relying on memorized facts.
Fine-tuning Approaches: The way a model is trained plays a significant role. Models trained with Reinforcement Learning (RL) showed stronger reasoning dominance. In contrast, models fine-tuned through distillation (learning from a teacher model) were more prone to retrieval-based responses and often engaged in “post-hoc explanation” – fabricating rationales to justify memorized answers.
Attention Patterns: By analyzing the internal activations of the LRMs, the researchers found that specific attention heads in the middle layers of the network act as a critical control point, arbitrating between following generated reasoning traces and deferring to retrieved answers.

Also Read:

Introducing FARL: Forgetting-Augmented Reinforcement Learning

Based on these insights, the researchers introduced a novel fine-tuning framework called FARL (Forgetting-Augmented Reinforcement Learning). The core idea behind FARL is to actively suppress retrieval shortcuts during RL training. By compelling the model to “forget” specific memorized answers, FARL forces the model to rely more on its genuine reasoning capabilities, thereby purifying the reward signal and enhancing reasoning development.

FARL demonstrated significant improvements. It reduced the influence of both reasoning and retrieval perturbations, indicating stronger reasoning-dominant behavior and enhanced CoT robustness. It also achieved higher accuracy improvements both within the training domain and on out-of-domain tasks compared to standard RL and supervised fine-tuning (SFT). Furthermore, FARL improved the quality of the generated CoTs, leading to more efficient and integrated reasoning trajectories.

This study offers a new perspective on how Large Reasoning Models generate answers, highlighting the interplay between deliberate reasoning and direct retrieval. The introduction of FARL provides a promising direction for more effectively eliciting and strengthening genuine reasoning abilities in LRMs. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unraveling How Large Reasoning Models Arrive at Answers

Uncovering the Dual Mechanisms

Factors Influencing Dominance

Introducing FARL: Forgetting-Augmented Reinforcement Learning

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates