TLDR: DecoupleSearch is a novel Agentic RAG framework that improves Large Language Model (LLM) performance by decoupling planning and search processes using dual value models. It employs Monte Carlo Tree Search (MCTS) for evaluating reasoning steps and Hierarchical Beam Search for efficient exploration and pruning of candidate plans and search results. This approach leads to more accurate and reliable answers, especially for complex, multi-step reasoning tasks, and allows smaller LLMs to achieve competitive performance.
Large Language Models (LLMs) have shown incredible capabilities across many tasks, but they sometimes struggle with generating factual information, leading to what’s known as ‘hallucinations’. To combat this, Retrieval-Augmented Generation (RAG) systems integrate external knowledge, allowing LLMs to pull in verifiable information and improve accuracy.
A more advanced approach, Agentic RAG, introduces autonomous AI agents into this process. These agents can plan their reasoning steps and then search for relevant information iteratively until they reach a final answer. However, Agentic RAG faces its own set of hurdles: the quality of each step relies heavily on both good planning and accurate searching, there’s often a lack of feedback for intermediate reasoning steps, and the sheer number of possible plans and searches can create an overwhelmingly large space to explore.
Introducing DecoupleSearch
To tackle these challenges, researchers have proposed a new framework called DecoupleSearch. This innovative approach separates the planning and search processes by using two distinct ‘value models’. This separation allows for independent optimization of how the agent thinks (planning) and how it finds information (searching).
DecoupleSearch builds a ‘reasoning tree’ where each point represents a planning or search action. To evaluate the quality of each step, it uses a technique called Monte Carlo Tree Search (MCTS). During the actual use of the system, a method called Hierarchical Beam Search is employed. This iteratively refines potential plans and search candidates, guided by the dual value models, to find the best path to an answer.
How DecoupleSearch Works
The core idea is to enhance the probability of success at each reasoning step. DecoupleSearch introduces phases for ‘planning exploration’ and ‘search exploration’. The agent first generates several possible plans, which are then evaluated by a ‘planning value model’ to pick the most promising ones. Based on these selected plans, the agent generates multiple search queries to retrieve documents. These search results are then ranked by a ‘search value model’ to ensure the retrieved information is reliable.
MCTS plays a crucial role in efficiently assessing the quality of each reasoning step. During simulations, the LLM itself acts as a judge, evaluating the quality of both planning and search results separately. Rewards from the correctness of the final answer are then used to refine the LLM’s internal scores, correcting any potential inaccuracies. To manage the vast number of possibilities, the planning and search spaces are pruned using these value models, which are trained on signals derived from the MCTS annotation process.
During inference, Hierarchical Beam Search ensures a thorough exploration. At each step, multiple plans are generated and filtered by the planning value model. Then, based on the best plans, search queries are created, and the retrieved documents are evaluated by the search value model to keep only the most valuable ones. This iterative process continues until a final answer is reached.
Also Read:
- FAIR-RAG: Enhancing LLM Accuracy with Evidence-Driven Iterative Refinement
- RaCoT: Enhancing LLM Reasoning Reliability with Pre-Retrieval Contrastive Thinking
Key Advantages and Findings
The DecoupleSearch framework offers several significant contributions. It decouples planning and search with dual value models, allowing for independent optimization. It also improves the success rate of each step by fully exploring planning and search spaces, using MCTS for accurate assessment and Hierarchical Beam Search for efficient pruning.
Extensive experiments across various question-answering datasets and different LLM sizes (like Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct) have demonstrated the effectiveness of this method. DecoupleSearch consistently outperformed existing baselines, showing a notable relative average improvement. Interestingly, the performance of DecoupleSearch with a smaller 7B model became comparable to that of a larger 14B model when Hierarchical Beam Search was applied, suggesting that inference-time scaling techniques can help smaller models achieve competitive results.
An ablation study revealed that both planning and search exploration are vital, with planning exploration having a more significant impact. This is because a good plan sets the stage for effective searching. The study also explored the impact of ‘planning expansion size’ and ‘search expansion size’ hyperparameters. While a planning expansion size of around 3 seemed optimal, larger search expansion sizes generally led to better performance, as evaluating search results is often more straightforward than evaluating abstract plans.
The effectiveness of the value models was also confirmed; using them to rank plans and search results consistently outperformed random selection. This highlights their ability to accurately gauge the quality of intermediate steps.
For a deeper dive into the technical details, you can read the full research paper here.


