TLDR: This research paper investigates whether Retrieval Augmented Language Models (RALMs) effectively “know when they don’t know,” focusing on their ability to refuse to answer questions. The study finds that RALMs often exhibit “over-refusal” when presented with irrelevant information, declining to answer questions they actually know. It evaluates refusal post-training methods, showing that In-Context Fine-tuning mitigates over-refusal, while Refusal-aware Instruction Tuning can worsen it and conflict with answer quality. The paper concludes by proposing a method to improve refusal by first assessing the model’s knowledge state and context utility.
Large Language Models, or LLMs, have shown incredible capabilities in many tasks, but they sometimes make up information, a problem known as hallucination. To tackle this, researchers have primarily used two methods: Retrieval Augmented Language Models (RALMs) and refusal post-training. RALMs use external knowledge to provide accurate answers, especially for questions outside their internal knowledge. Refusal post-training, on the other hand, teaches models to say “I don’t know” when they are uncertain.
However, a recent study titled “Do Retrieval Augmented Language Models Know When They Don’t Know?” delves into a crucial, often overlooked aspect: how well RALMs understand their own knowledge boundaries, particularly their ability to refuse to answer when appropriate. The paper, available at this link, was authored by Youchao Zhou, Heyan Huang, Yicheng Liu, Rui Dai, Xinglin Wang, Xinchen Zhang, Shumin Shi from Beijing Institute of Technology, and Yang Deng from Singapore Management University.
Understanding RALM Knowledge States
The researchers explored whether RALMs are well-calibrated across different internal and external knowledge states. They found that LLMs often exhibit “over-refusal” behavior, meaning they refuse to answer questions they actually know, especially when presented with irrelevant information. This is a significant finding, as it highlights a vulnerability where contextual distractions can impair a model’s ability to distinguish between its internal knowledge and external information.
For instance, imagine asking an RALM “When does the 2022 Olympic Winter Games end?” If the retrieved context contains misinformation, the model might become confused and refuse to answer, even if it internally knows the correct date. This “over-refusal” is a key problem identified in the study.
Impact of Refusal Post-Training
The study also investigated how different refusal post-training methods affect this over-refusal issue. They looked at two main approaches: Refusal-aware Instruction Tuning (R-tuning) and In-Context Fine-tuning (ICFT). The results showed that ICFT helped mitigate the over-refusal problem, while R-tuning actually made it worse. This suggests that while refusal training aims to improve a model’s self-awareness, some methods can inadvertently reduce the quality of answers, especially when positive, helpful context is available.
Specifically, R-tuning, while improving refusal quality in some scenarios, led to an increase in the over-refusal rate and a decrease in answer precision. ICFT, particularly when trained with negative contexts (ICFT(n)), showed better overall accuracy and refusal quality, and was more effective at reducing over-refusal. However, the study also noted that refusal ability can sometimes conflict with answer correctness, especially when positive context is present, due to a degradation of context utility.
Also Read:
- MeVe: A New Framework for Smarter LLM Context Management
- AnchorRAG: A Multi-Agent Framework for Enhanced Open-World Question Answering with Knowledge Graphs
Improving Refusal Techniques
Finally, the researchers proposed a simple yet effective refusal method for post-trained models to improve their overall answer quality. This technique involves first detecting the internal and external knowledge state of the LLM, then deciding whether to use context or abstain from answering. By doing so, the model can achieve more calibrated confidence and avoid using harmful negative contexts, leading to better overall performance and reduced over-refusal.
In conclusion, this research provides a deeper understanding of how external contexts influence the calibration of RALMs. It highlights that exclusively negative contexts can significantly harm calibration and lead to over-refusal. While refusal instruction tuning aims to improve self-awareness, its effectiveness varies, with In-Context Fine-tuning showing promise in mitigating over-refusal. The study emphasizes the importance of balancing proper refusal with effective context utilization to build more reliable and practical RALM systems.


