Beyond Solving: How Large AI Models Learn to Seek Information

TLDR: A new study introduces CRITIC-math, a dataset of incomplete math problems, to evaluate if Large Reasoning Models (LRMs) can proactively ask for missing information. It finds that current LRMs often fail to ask questions, instead overthinking or hallucinating answers. However, supervised fine-tuning on this new dataset significantly improves their ability to identify incompleteness and ask for clarification, suggesting a path towards more genuinely intelligent AI.

In the rapidly evolving world of artificial intelligence, Large Reasoning Models (LRMs) have shown impressive capabilities, particularly in solving complex mathematical problems. These models are often evaluated on benchmarks that feature well-defined problems, where all necessary information is provided. However, a recent research paper highlights a critical gap in this evaluation approach: real-world problems are rarely perfectly defined and often lack sufficient information.

The paper, titled “Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information,” argues that a truly intelligent AI agent should not only be able to solve problems but also possess the crucial ability to ask for additional information when faced with incomplete scenarios. This proactive information-seeking is vital for AI assistants to provide genuinely helpful responses, rather than making assumptions that could lead to meaningless or incorrect answers.

The Challenge of Incompleteness

Imagine asking an AI, “My living room is 6 meters long. How many tiles (with a side length of 60 centimeters) do I need in total?” A human would immediately ask for the width of the room, as without it, any answer would be a guess. Current LRMs, however, often try to “solve” such incomplete problems, leading to behaviors like overthinking or even hallucinating missing information.

To address this, researchers introduced a new dataset called CRITIC-math. This benchmark consists of two types of incomplete mathematical problems: those with a “missing goal” (where the question itself is unclear) and those with “missing premises” (where essential information is absent). The dataset includes 1.3K test problems and 5.3K training problems, created by transforming existing well-defined math problems and undergoing careful manual verification.

What the Study Revealed

The systematic evaluation of state-of-the-art LRMs using CRITIC-math yielded significant insights:

Lack of Proactivity: When given only the problem (an “implicit prompt”), LRMs showed a very low tendency to ask for clarification, with clarification ratios around 25%. Even when explicitly instructed to ask for information if needed, their clarification ratios only reached about 50%, indicating a persistent struggle.
Failure Modes: The study identified specific ways LRMs fail to ask questions:
- Thoughts-to-Answer Unfaithfulness: For problems with missing premises, LRMs sometimes recognized the need for more information in their internal thought processes but still proceeded to generate an answer without asking.
- Overthinking: When premises were missing, LRMs often engaged in extensive internal thinking to try and resolve the incompleteness themselves, leading to significantly longer thought processes and delayed responses, rather than simply asking for the missing piece.
- Hallucinations: For problems with missing goals, LRMs frequently imagined a goal and then solved that imagined problem, demonstrating an inconsistency with the user’s implicit directive.

Training Models to Ask

Despite these limitations, the research also explored the potential of Supervised Fine-Tuning (SFT) to teach LRMs this crucial ability. By fine-tuning a model (CRITIC-Qwen) on the CRITIC-math dataset, the researchers observed a remarkable improvement in its ability to ask for information. This fine-tuned model significantly outperformed existing state-of-the-art LRMs in identifying incompleteness and raising clarification questions.

Interestingly, the study found that learning to ask for information on incomplete problems did not contradict, and in some cases even benefited, the model’s ability to solve well-defined problems. However, a dilemma was also uncovered: the current mode of “deep-thinking” in LRMs, which is optimized for solving problems, might actually hinder their ability to proactively ask for information.

Also Read:

A New Path for AI Intelligence

This research provides new insights into developing genuinely intelligent LRMs. It suggests that focusing solely on problem-solving benchmarks overlooks a fundamental aspect of intelligence: the ability to recognize limitations and seek external information. The CRITIC-math dataset and the findings from this study pave the way for future AI development that aims to create agents capable of navigating the uncertainties inherent in real-world scenarios, moving beyond being mere “math quiz solvers.” You can read the full paper at arXiv.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Beyond Solving: How Large AI Models Learn to Seek Information

The Challenge of Incompleteness

What the Study Revealed

Training Models to Ask

A New Path for AI Intelligence

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates