PRISM: A New Agentic Approach to Smarter Information Retrieval for Complex Questions

TLDR: PRISM is an agentic retrieval framework that uses Large Language Models (LLMs) to improve multi-hop question answering. It features three specialized agents: a Question Analyzer to break down complex questions, a Selector to filter for precision, and an Adder to ensure recall by recovering missing evidence. This iterative process creates compact and comprehensive evidence sets, leading to significantly higher retrieval accuracy and improved end-to-end QA performance across various benchmarks, outperforming existing methods and mitigating LLM limitations like ‘lost-in-the-middle’ and hallucination.

In the rapidly evolving field of artificial intelligence, answering complex questions accurately remains a significant challenge, especially when information needs to be gathered from multiple sources. This is known as multi-hop question answering (QA). A new research paper introduces an innovative solution called PRISM, which stands for Precision–Recall Iterative Selection Mechanism. This framework aims to enhance how large language models (LLMs) retrieve information, making the process more precise and comprehensive.

The core idea behind PRISM is to use a system of specialized AI agents that work together in a structured loop. This agentic retrieval system is designed to overcome common limitations of LLMs, such as the ‘lost-in-the-middle’ phenomenon, where crucial information in long texts is overlooked, and the tendency to ‘hallucinate’ or generate incorrect information when context is incomplete or noisy.

How PRISM Works: The Three Agents

PRISM employs three distinct LLM-based agents, each with a specific role:

1. Question Analyzer Agent: This agent is the first step. It takes a complex multi-hop question and breaks it down into smaller, more manageable sub-questions. For example, if asked, “Which painter who shared a house with Vincent van Gogh was married to a Danish ceramist?”, the Analyzer would decompose it into sub-questions like “Who shared a house with van Gogh?” and “Who was that person married to?”. This decomposition helps in focusing the search and ensuring no critical piece of information is missed.

2. Selector Agent: After initial retrieval, many passages might seem relevant but are actually distractors. The Selector agent acts as a precision-focused filter. Its job is to meticulously review the candidate evidence and remove any passages that are definitely irrelevant to the sub-questions. This ensures that the downstream QA model receives a clean, compact set of highly relevant information, reducing noise and mitigating the risk of hallucinations.

3. Adder Agent: While the Selector focuses on precision, overly strict filtering can sometimes lead to missing crucial, complementary facts. The Adder agent is designed to address this by prioritizing recall. It re-examines the evidence that the Selector left behind and adds any missing pieces that are essential for completing the reasoning chain. This could include bridging facts that connect entities across different documents or filling logical gaps. The Selector and Adder agents work in an iterative loop, refining the evidence set until it is both compact and complete.

This iterative refinement loop, where the Selector prunes for precision and the Adder expands for recall, is a key innovation. It ensures that the final set of supporting passages is not only highly relevant but also comprehensive enough to answer multi-hop questions accurately.

Answering the Question: The Answer Generator

Once the Question Analyzer, Selector, and Adder agents have collaboratively constructed a compact and comprehensive set of supporting evidence, this refined context is passed to an Answer Generator agent. This agent, also an LLM, then uses the provided evidence to generate the final answer to the original complex question. The researchers implemented this in a zero-shot setting, meaning the LLM was not specifically fine-tuned for the task, allowing for a direct assessment of how improved retrieval quality impacts the final answer accuracy.

Also Read:

Performance and Impact

Experiments conducted on several multi-hop QA benchmarks, including HotpotQA, 2WikiMultiHopQA, MuSiQue, and MultiHopRAG, demonstrated that PRISM consistently outperforms strong baseline methods. For instance, on HotpotQA, PRISM achieved a recall of 90.9% compared to 61.5% for a single-pass retriever and 72.8% for IRCoT, another advanced retrieval method. Similar significant gains were observed across other datasets, particularly on the challenging MuSiQue benchmark.

The improved retrieval quality directly translated into stronger end-to-end question answering performance. PRISM achieved state-of-the-art accuracy on HotpotQA, MuSiQue, and MultiHopRAG, and remained highly competitive on 2WikiMultiHopQA. This highlights that providing LLMs with compact, comprehensive, and noise-free evidence is crucial for their reasoning capabilities.

The framework also showed robustness across different LLMs, including GPT-4o, Gemini-2.5-Flash-Lite, and DeepSeek. While absolute scores varied, the precision-recall balancing mechanism consistently delivered high recall and competitive QA accuracy, indicating that PRISM’s design is not tied to a specific LLM architecture.

In conclusion, PRISM represents a significant step forward in multi-hop question answering. By treating retrieval as an active, agent-driven process that collaborates with the QA model, it provides a principled way to build more reliable and reasoning-centric retrieval systems. For more technical details, you can refer to the original research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

PRISM: A New Agentic Approach to Smarter Information Retrieval for Complex Questions

How PRISM Works: The Three Agents

Answering the Question: The Answer Generator

Performance and Impact

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates