TLDR: CORE is a novel method that uses reinforcement learning to achieve lossless context compression for Retrieval-Augmented Generation (RAG) in Large Language Models (LLMs). By optimizing the compression process based on end-task performance, CORE significantly reduces input length and computational costs while not only preventing performance degradation but also improving answer accuracy, demonstrating strong generalization across various datasets and LLMs.
Large Language Models (LLMs) have transformed how we interact with AI, demonstrating impressive capabilities in understanding and generating human-like text. However, these powerful models often struggle with staying up-to-date with the latest information and maintaining factual accuracy. This is where Retrieval-Augmented Generation (RAG) comes into play, a technique that enhances LLMs by allowing them to retrieve relevant documents from vast knowledge bases and use this information to inform their responses.
While RAG significantly boosts the performance of LLMs on knowledge-intensive tasks, it introduces a new challenge: the sheer volume of retrieved documents can make the input context excessively long. This leads to higher computational costs and can even make it difficult for the LLM to effectively utilize all the information, sometimes overlooking crucial details buried within the lengthy text.
Previous attempts to address this issue have focused on compressing these retrieved documents into shorter texts before feeding them to the LLM. However, many of these methods often compromise the accuracy of the final output. The main hurdle has been the lack of clear targets for what constitutes an “ideal” compressed summary, forcing many approaches to rely on fixed rules that don’t guarantee the compressed content will truly support the LLM’s task.
Introducing CORE: Lossless Compression with Reinforcement Learning
To overcome these limitations, researchers have developed CORE (COmpression via REinforcement learning), a novel method designed to achieve “lossless” context compression for RAG. Lossless here means that the compression doesn’t degrade the end-task performance of the LLM; in fact, CORE often improves it. This innovative approach leverages reinforcement learning (RL) to optimize the compression process without needing predefined compression labels.
At its heart, CORE uses the LLM’s end-task performance—specifically, the accuracy of its answers—as a reward signal. This signal guides the training of a dedicated compressor model. The training is implemented using a technique called Generalized Reinforcement Learning Policy Optimization (GRPO), which allows the compressor to learn how to generate summaries that maximize the accuracy of the answers produced by the LLM. This end-to-end training framework ensures that the compressor is goal-oriented, focusing on creating summaries that are most helpful for the LLM to provide accurate responses.
The CORE framework is designed to be efficient. The compressor model itself is intentionally much smaller than the main LLM, ensuring that the computational benefits of compression are not offset by a large, complex compressor. The training process also includes a “distillation warm-up” phase, where a very large language model acts as a teacher to provide an initial strong policy for the smaller compressor, ensuring stable and effective reinforcement learning.
Also Read:
- Enhancing AI’s Factual Accuracy with Layer Fused Decoding
- Crafting Robust RAG Evaluations: A Multi-Agent System for Diverse and Private Data
Impressive Results and Generalization
Extensive experiments conducted on four benchmark datasets—Natural Questions, TriviaQA, HotpotQA, and 2WikiMultihopQA—demonstrate the significant superiority of CORE. With an impressive compression ratio, reducing the context to as little as 3% of its original length, CORE not only avoids any performance degradation compared to using full, uncompressed documents but also significantly improves the average Exact Match (EM) score by 3.3 points across all datasets. For instance, on the Natural Questions dataset, CORE achieved a 3.6% token usage compression ratio while improving Exact Match by 3.2 points compared to prepending ten full documents.
Furthermore, CORE exhibits strong generalization abilities. The framework is not dependent on a specific compressor architecture, meaning different models can be used to train the compressor with similar success. Crucially, a compressor trained with CORE can also be effectively transferred to different large language models (e.g., from Qwen2.5-14B-Instruct to LLaMA-3.1-8B-Instruct) without retraining, consistently outperforming baselines that use full documents. This indicates that the summaries generated by CORE are inherently high-quality and contain the essential information needed for accurate answering.
In essence, CORE represents a significant step forward in making RAG systems more efficient and effective. By intelligently compressing retrieved information, it allows LLMs to leverage vast knowledge bases without being overwhelmed by long contexts, ultimately leading to more accurate and timely responses. For more technical details, you can refer to the full research paper here.


