TLDR: DIVER is a new multi-stage retrieval pipeline designed for complex, reasoning-intensive information retrieval tasks. It improves search accuracy by processing documents for quality, expanding queries using AI through iterative interaction, employing a specialized reasoning-enhanced retriever fine-tuned on synthetic multi-domain data with hard negatives, and reranking results by combining LLM-assigned helpfulness scores with retrieval scores. DIVER achieved state-of-the-art performance on the BRIGHT benchmark, demonstrating its effectiveness in understanding abstract relationships and multi-step inferences in queries.
In the rapidly evolving world of artificial intelligence, retrieval-augmented generation (RAG) has emerged as a powerful technique for knowledge-intensive tasks, allowing AI systems to pull information from vast datasets to answer queries. However, a significant challenge remains: how do these systems handle queries that require deep reasoning, analogical thinking, or multi-step inference, rather than just direct keyword or semantic matches?
A new research paper introduces DIVER, a sophisticated multi-stage retrieval pipeline specifically designed to tackle these reasoning-intensive information retrieval challenges. Developed by researchers from Sun Yat-sen University and Ant Group, DIVER aims to bridge the gap between simple information retrieval and complex reasoning tasks.
The Core Components of DIVER
DIVER is not a single tool but a comprehensive pipeline, integrating four key components that work in synergy to enhance retrieval performance:
1. Document Processing (DIVER-DChunk): Real-world documents often come with quality issues like excessive blank lines, truncated sentences, or overly long sections. DIVER first cleans these documents and then intelligently rechunks them into smaller, semantically coherent segments. This preprocessing step ensures that the input quality for subsequent stages is optimal, preventing information loss and improving readability for the AI.
2. LLM-driven Query Expansion (DIVER-QExpand): User queries, especially those requiring reasoning, can be ambiguous or too concise. DIVER addresses this by using a large language model (LLM) to iteratively expand and refine the original query. Through multiple rounds of interaction with initially retrieved documents, the query is dynamically updated, allowing for more diverse and context-aware interpretations. This feedback loop helps the system better understand the user’s true intent.
3. Reasoning-enhanced Retriever (DIVER-Retriever): Traditional retrievers often struggle with complex reasoning tasks because they are typically trained on simpler, fact-based queries. DIVER overcomes this by fine-tuning a powerful embedding model (Qwen3-Embedding-4B) on a specially constructed dataset. This dataset includes synthetic multi-domain data (medical, coding, mathematical) and, crucially, ‘hard negative’ documents—documents that appear superficially relevant but lack actual relevance. Training with these hard negatives forces the retriever to learn to distinguish between surface-level similarity and true semantic relevance, making it adept at complex reasoning. The relevance scores from this specialized retriever are then combined with traditional BM25 scores to capture both deep reasoning and surface-level similarities.
4. Pointwise Reranker (DIVER-Rerank): After the initial retrieval, DIVER employs a reranking stage to further refine the results. An off-the-shelf LLM assigns a helpfulness score (from 0 to 10) to each retrieved document based on its relevance to the query. To break ties and provide more granular rankings, these LLM-assigned scores are interpolated with the initial retrieval scores, leading to a more precise final ranking of documents.
Also Read:
- Understanding Advanced AI Search Agents
- LMAR: Enhancing Retrieval-Augmented Generation for Specialized Knowledge
Performance and Impact
The effectiveness of DIVER was rigorously tested on the BRIGHT benchmark, a challenging dataset of 1,384 real-world queries from diverse domains like economics, psychology, mathematics, and programming, all requiring complex reasoning. DIVER achieved a state-of-the-art nDCG@10 score of 41.6, outperforming previous leading models like XRR2 and other reasoning-aware baselines such as ReasonIR and RaDeR. This demonstrates DIVER’s superior ability to handle complex, real-world information retrieval tasks.
The research highlights that DIVER achieves this high performance with significantly lower computational costs compared to some commercial models, making it a highly efficient solution. The ablation studies in the paper further confirm the individual contributions of each component, showing how document cleaning, query expansion, and the specialized retriever each play a vital role in the overall success.
The DIVER pipeline represents a significant step forward in making AI systems more capable of understanding and responding to complex, reasoning-intensive queries. By focusing on iterative query refinement and training a retriever on high-quality, challenging data, DIVER sets a new standard for information retrieval in scenarios where relevance goes beyond simple keyword matching. For more technical details, you can refer to the full research paper here.


