TLDR: This research introduces a novel framework for financial question-answering that combines Agentic AI with Multi-HyDE, a system generating multiple, distinct queries for enhanced knowledge retrieval. Optimized for token efficiency and multi-step financial reasoning, the approach improves accuracy by 11.2% and reduces hallucinations by 15% on financial QA benchmarks. It integrates dense and sparse retrieval strategies with a dynamic agent pipeline capable of query clarification, iterative refinement, and tool calling, offering a modular and reliable solution for high-stakes financial applications.
In the complex and high-stakes world of finance, getting accurate and reliable information is crucial. Financial decisions, whether for investment, regulatory compliance, or market analysis, depend on precise data from constantly updated sources like regulatory filings, market reports, and multi-year financial statements. Traditional systems often struggle with the sheer volume and intricate nature of this data, leading to inaccuracies or ‘hallucinations’ from AI models.
A new research paper, titled “Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction,” introduces an innovative framework designed to significantly improve how AI systems retrieve and generate answers for financial questions. Authored by Akshay Govind Srinivasan, Ryan Jacob George, Jayden Koshy Joe, Hrushikesh Kant, Harshith M R, Sachin Sundar, Sudharshan Suresh, Rahul Vimalkanth, and Vijayavallabh from the Indian Institute of Technology Madras, this work addresses the critical need for precision and reliability in financial AI applications.
The Challenge of Financial Information Retrieval
Large Language Models (LLMs) have made remarkable progress in understanding and generating human-like text. However, a major hurdle for their use in finance is their tendency to hallucinate—producing factually incorrect information. In a domain where even small errors can have significant consequences, this is unacceptable. Retrieval-Augmented Generation (RAG) frameworks aim to solve this by grounding LLM outputs in external knowledge. Yet, conventional RAG systems, which often rely on a single database and retriever, fall short when faced with the semantic complexities and vastness of financial documents.
Financial reports often contain semantically similar passages that differ only in crucial numerical or temporal details. A standard RAG system might confuse these, leading to incorrect answers. Furthermore, complex financial queries often require multi-step reasoning and the ability to access various types of data, which a static ‘retrieve-then-generate’ approach cannot handle effectively.
Introducing Multi-HyDE and Agentic AI
The researchers propose a two-pronged solution: Multi-HyDE and an Agentic AI pipeline. Multi-HyDE is an advanced retrieval mechanism that goes beyond traditional methods. Instead of generating just one hypothetical answer or similar queries, it generates multiple, distinct, yet contextually related queries. For example, if a user asks about a company, Multi-HyDE might generate separate queries about its fraud investigations and criminal cases, even if both might be answered within the same document. This multi-perspective approach significantly boosts the effectiveness and coverage of information retrieved from large financial datasets.
This system also integrates keyword-based retrieval, like BM25, which is particularly effective for structured data such as tables and for distinguishing between semantically similar documents from different years. This hybrid strategy ensures that both the semantic meaning and exact keywords are considered, leading to more precise results.
The Agentic Pipeline: Dynamic Reasoning and Tool Use
The second core component is an Agentic AI pipeline. This means the LLM acts as an intelligent orchestrator, capable of dynamic decision-making. It doesn’t just retrieve and generate; it plans, adapts, and verifies. The process involves several stages:
-
Query Clarification: The system first seeks to understand the user’s question fully, even using web search if needed.
-
Initial Retrieval: It uses Multi-HyDE and keyword-based methods to fetch relevant content.
-
Iterative Refinement: If the initial results aren’t enough, the system formulates a plan, which might involve breaking down the query into sub-queries, performing multi-hop searches (looking for information across several steps), or invoking external tools.
-
Tool Calling: The agent can dynamically use a suite of tools, including financial data APIs (like EDGAR Tool, Alpha Vantage), web search (SERP API, Bing, DuckDuckGo), and mathematical tools (WolframAlpha API, Python calculator). This allows it to access real-time data, perform calculations, and gather information beyond its initial knowledge base.
-
Final Response: Once sufficient evidence is gathered and verified, the system synthesizes and delivers the final, accurate answer.
This iterative, evidence-driven process significantly reduces hallucinations and ensures that answers are grounded in verifiable sources, making the system highly reliable for diverse and complex financial queries.
Also Read:
- KAHAN: A Framework for Intelligent Financial Data Narration
- Enhancing RAG Systems: A New Approach to Document Utility with Process Supervision
Demonstrated Improvements and Future Outlook
The framework was evaluated on standard financial QA benchmarks, including subsets of FinanceBench and ConvFinQA datasets. The results are promising: the combined approach improved accuracy by 11.2% and reduced hallucinations by 15% compared to traditional methods. Human evaluation was emphasized to provide a more accurate assessment, especially for numerical questions where automated metrics often fall short.
The research highlights that integrating domain-specific retrieval mechanisms like Multi-HyDE with robust toolsets significantly enhances both the accuracy and reliability of answers. This modular and adaptable framework not only offers a pathway for more trustworthy AI deployment in high-stakes financial applications but also suggests that optimizing retrieval can yield greater benefits than solely developing domain-specific language models.
While the current evaluation was conducted on a relatively small dataset due to resource constraints, and human oversight is still required for complex cases, this work represents a significant step forward. Future work includes fine-tuning smaller language models for specific financial tasks and developing more nuanced evaluation metrics. For more details, you can read the full research paper here.


