TLDR: This paper introduces Confident RAG, a method that improves Retrieval-Augmented Generation (RAG) by generating multiple answers using different embedding models and then selecting the most confident response. It achieves significant accuracy improvements (up to 10% over vanilla LLMs) by leveraging confidence metrics like Self-Certainty and Distributional Perplexity, proving more effective than simply mixing retrieved documents.
Large Language Models (LLMs) have transformed many fields, but integrating up-to-date or external knowledge remains a challenge. Retrieval-Augmented Generation (RAG) offers a cost-effective solution by fetching relevant information to guide LLMs. However, a key hurdle in RAG is selecting the best embedding model, as different models perform differently across various domains, leading to varied similarity calculations and, consequently, affecting the quality of LLM responses.
Addressing the Embedding Challenge in RAG
To tackle this issue, researchers Shiting Chen, Zijian Zhao, and Jinsong Chen propose and examine two novel approaches to enhance RAG by combining the strengths of multiple embedding models. These methods are called Mixture-Embedding RAG and Confident RAG. Their work, detailed in their paper “Each to Their Own: Exploring the Optimal Embedding in RAG,” explores how to get the most out of diverse embedding models.
Mixture-Embedding RAG: A Simple Combination
The first approach, Mixture-Embedding RAG, attempts to improve retrieval by simply sorting and selecting relevant information from multiple embedding models based on a standardized similarity score. The idea is to gather the best retrievals from various sources. However, experiments showed that this method did not consistently outperform standard, or “vanilla,” RAG. This could be due to factors like information overload or the LLM’s inability to fully leverage the combined, potentially noisy, references, especially for models already fine-tuned for specific tasks like mathematics.
Confident RAG: Selecting the Best Response
In contrast, Confident RAG takes a different route. Instead of combining retrievals upfront, it generates responses multiple times, each time using a different embedding model with vanilla RAG. After generating these multiple answers, it then selects the response with the highest confidence level. This method proved to be significantly more effective, demonstrating average improvements of approximately 10% over vanilla LLMs and 5% over vanilla RAG. This consistent improvement across different LLMs and embedding models highlights Confident RAG as an efficient and adaptable solution.
Key Confidence Metrics and Optimal Setup
The study identified two confidence metrics as particularly effective: Self-Certainty and Distributional Perplexity (DP). Both showed average improvements of around 10% compared to vanilla LLMs. These metrics are superior because they directly measure how concentrated and certain the LLM’s probability distribution is for its generated tokens, effectively filtering out less reliable answers. The research also explored the optimal number of embedding models to use in Confident RAG. While using more models might seem better, the study found that increasing the number beyond three (N=3) yielded only marginal benefits, suggesting that N=3 strikes a good balance between performance and computational cost.
Also Read:
- Enhancing Developer Support with Adaptive AI Retrieval for Language Models
- Cleanse: A Clustering-Based Approach to Detect Hallucinations in Large Language Models
A Plug-and-Play Solution for Enhanced LLM Performance
The consistent and stable results from Confident RAG suggest it can serve as a “plug-and-play” method for enhancing LLM performance across various domains. By intelligently leveraging the strengths of multiple embedding models and focusing on the confidence of the generated responses, Confident RAG offers a practical way to improve the accuracy and reliability of Retrieval-Augmented Generation systems. You can find more details about this research at the research paper.


