TLDR: A new framework called CARE (Context-Aware Retrieval-Enhanced reasoning) helps large language models (LLMs) better use provided information by teaching them to find and integrate relevant evidence directly within their thinking process. This “native retrieval-augmented reasoning” improves answer accuracy and reduces inconsistencies, especially in complex questions, without needing extensive external tools or labeled data. CARE uses a two-phase training process (supervised fine-tuning and reinforcement learning) combined with curriculum learning to achieve significant performance gains over existing methods, particularly in maintaining context fidelity and handling counterfactual information.
Large language models (LLMs) have become incredibly powerful, but they often struggle with a common issue: staying true to the information they’re given. This problem, known as ‘context hallucination,’ means LLMs might give answers that are inconsistent or even make up facts, especially when dealing with tasks that require precise knowledge.
Existing solutions typically fall into two categories. One involves Retrieval-Augmented Generation (RAG), where models retrieve evidence from external sources. While this can help, it often requires a lot of pre-labeled data and adds complexity with extra modules and databases. The other approach uses external search mechanisms, allowing models to look for information beyond what they already know. However, these methods can overlook the valuable context already provided by the user and introduce delays or inconsistencies.
Introducing CARE: A New Way to Reason
A groundbreaking new framework called CARE (Context-Aware Retrieval-Enhanced reasoning) proposes a fundamentally different solution: native retrieval-augmented reasoning. Instead of treating retrieval and reasoning as separate steps, CARE teaches LLMs to actively find and integrate relevant evidence from the input context directly into their thinking process. This approach leverages the LLM’s inherent understanding of language to perform ‘in-context retrieval’ without needing additional indexing or embedding systems.
CARE operates through a two-phase training process. First, a supervised fine-tuning (SFT) phase familiarizes the model with a specific output format that includes retrieved facts within its reasoning. This phase uses a limited amount of labeled data to establish how the model should integrate evidence. Following this, a reinforcement learning (RL) phase refines the model’s self-retrieval abilities. It uses rewards to encourage consistency with evidence and logical coherence, even when only given question-answer pairs without explicit supporting facts.
A clever curriculum learning strategy is also employed, allowing the model to gradually adapt from simpler to more complex reasoning tasks. This means CARE can handle a wide range of question-answering scenarios without needing more labeled data beyond the initial training.
Key Advantages and Performance
The core contributions of CARE are significant:
- It introduces native retrieval-augmented reasoning, seamlessly combining in-context retrieval with structured reasoning to improve context fidelity and reduce hallucinations.
- It provides a specialized dataset for training models in evidence-integrated reasoning, which has been open-sourced for further research.
- It offers a comprehensive implementation that combines this native retrieval with curriculum learning to manage diverse question-answering situations.
Extensive experiments on various real-world and counterfactual question-answering benchmarks show that CARE consistently outperforms traditional methods. For instance, on the LLaMA-3.1 8B model, CARE achieved a 15.29% average F1 improvement over the original model, with even stronger gains in multi-hop tasks. It also demonstrated superior context fidelity in counterfactual scenarios, where models are presented with information that contradicts their pre-trained knowledge. This suggests that CARE helps LLMs stick to the provided context, even when it goes against what they ‘think’ they know.
While CARE does generate longer outputs due to its detailed reasoning chains, it eliminates the need for external API calls and database retrievals, which are often required by other methods. This makes it more efficient in terms of external dependencies.
Also Read:
- Enhancing LLM Accuracy with InfoGain-RAG: A New Approach to Document Filtering and Reranking
- Enhancing LLM Reasoning with Latent Thought Optimization
Looking Ahead
While CARE marks a significant step forward, the researchers acknowledge some limitations. The native retrieval mechanism is excellent for information already in the context but cannot access external knowledge. For such cases, it might need to be combined with external retrieval systems. Also, while it improves context fidelity, it doesn’t completely eliminate hallucinations, especially with ambiguous or contradictory input. Future work aims to address these challenges and expand CARE’s application to a broader range of language tasks.
This research represents a fundamental advancement in making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks, particularly when relevant information is already present in the input context. For more details, you can read the full research paper here: Improving Context Fidelity via Native Retrieval-Augmented Reasoning.


