TLDR: Large Reasoning Models (LRMs) often fail to abstain from unanswerable questions, generating incorrect answers or getting stuck, despite internally recognizing the problem. Researchers propose a two-stage method combining cognitive monitoring (detecting unanswerability internally) and inference-time intervention (prompting abstention) to significantly improve LRMs’ ability to say “I don’t know” without harming performance on answerable questions.
Large Reasoning Models (LRMs), like GPT-o1 and DeepSeek-R1, have made incredible strides in solving complex problems. These advanced AI systems can explore various thought processes and even correct their own mistakes, making them highly valuable in situations where accuracy is critical. However, a new study highlights a significant challenge: these models often struggle when faced with questions that simply cannot be answered, such as math problems lacking crucial information.
The research, titled “Answering the Unanswerable Is to Err Knowingly: Analyzing and Mitigating Abstention Failures in Large Reasoning Models,” reveals that LRMs frequently fail to appropriately abstain from answering these “unanswerable” questions. Instead of admitting they don’t know, they might generate incorrect answers based on made-up details or get stuck in a loop of reasoning without reaching a conclusion. This issue is critical for building trustworthy AI, as reliability is fundamental to user confidence.
The Problem: When AI Doesn’t Know When to Say “I Don’t Know”
The authors, Yi Liu, Xiangyu Liu, Zequn Sun, and Wei Hu from Nanjing University, conducted a detailed analysis of how LRMs respond to unanswerable questions. They found that even though LRMs possess the internal “cognitive” ability to recognize flaws in these questions, they often don’t show this understanding in their final responses. This creates a disconnect: the model internally knows a question is unsolvable but still tries to answer it, leading to what the researchers call “abstention failure.”
The study categorized LRM responses to unanswerable questions into three types:
- Correct Abstention: The model correctly identifies the question as unanswerable and explicitly states, “I don’t know,” often with an explanation. This is the desired behavior.
- Hallucinated Answer: The model invents missing information or makes assumptions to produce a complete, but incorrect, answer. For example, it might assume a shipping charge that was never mentioned.
- Cognitive Fixation: The model gets stuck in a prolonged reasoning process, reformulating the problem or pursuing invalid paths without ever reaching a conclusion or abstaining.
Interestingly, the researchers observed that as model capacity increases, the rate of correct abstention tends to rise, and hallucinated answers and cognitive fixation decrease. However, a significant portion of unanswerable questions still don’t receive correct abstentions, indicating a persistent limitation.
Bridging the Gap: Internal Awareness vs. External Response
A key finding was that LRMs often show internal signs of recognizing unanswerability during their reasoning process. Through “stopping points” in the reasoning trajectory, the models could often correctly explain why a question was unanswerable, even if their final output was a hallucinated answer or cognitive fixation. This suggests that the models have the awareness but struggle to act on it.
Further analysis revealed that when LRMs fail to abstain, they generally have lower confidence in generating an “I don’t know” response. The frequency of abstention responses during the reasoning process was also lower in these failure cases. This indicates that while the LRM might internally recognize an unanswerable question, the signal to abstain isn’t strong enough to override its bias towards providing an answer.
Also Read:
- Meta-R1: Giving AI Models the Power to Think About Their Own Thinking
- Enhancing AI Reasoning: How Metacognition Bridges the Gap Between Fast Language Models and Deliberate Reasoning Models
A Two-Stage Solution: Cognitive Monitoring and Inference-Time Intervention
To address this misalignment, the researchers proposed a lightweight, two-stage method:
- Cognitive Monitoring: This stage involves continuously tracking the model’s internal “hidden states” during its reasoning process. A small, specially trained “linear probe” is used to estimate the probability that the question is unanswerable. If this probability crosses a certain threshold, the system moves to the next stage.
- Inference-Time Intervention: Once unanswerability is detected, the model’s reasoning process is interrupted. An “instructional guidance prompt” is introduced, explicitly reminding the model not to make assumptions and to abstain if information is missing. This prompt, combined with an early exit strategy, encourages the model to output “I don’t know” rather than continuing to guess or get stuck.
Experiments on various LRMs and datasets, including SUM and UMWP, showed significant improvements. The method dramatically increased the abstention rate and the accuracy of reasons provided for unanswerability, all while maintaining the model’s performance on answerable questions. It also reduced the number of tokens used, making the reasoning process more efficient.
The study emphasizes that simply stopping the model early without proper guidance can sometimes lead to more hallucinated answers. The instructional guidance is crucial for steering the model towards correct abstention rather than speculative answers. This research offers a promising path towards making large reasoning models more reliable and trustworthy, ensuring they know when to confidently say, “I don’t know.” You can read the full research paper here: Answering the Unanswerable Is to Err Knowingly.


