Boosting Financial AI Accuracy with Multi-Perspective Retrieval and Intelligent Agents

TLDR: This research introduces a novel framework for financial question-answering that combines Agentic AI with Multi-HyDE, a system generating multiple, distinct queries for enhanced knowledge retrieval. Optimized for token efficiency and multi-step financial reasoning, the approach improves accuracy by 11.2% and reduces hallucinations by 15% on financial QA benchmarks. It integrates dense and sparse retrieval strategies with a dynamic agent pipeline capable of query clarification, iterative refinement, and tool calling, offering a modular and reliable solution for high-stakes financial applications.

In the complex and high-stakes world of finance, getting accurate and reliable information is crucial. Financial decisions, whether for investment, regulatory compliance, or market analysis, depend on precise data from constantly updated sources like regulatory filings, market reports, and multi-year financial statements. Traditional systems often struggle with the sheer volume and intricate nature of this data, leading to inaccuracies or ‘hallucinations’ from AI models.

A new research paper, titled “Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction,” introduces an innovative framework designed to significantly improve how AI systems retrieve and generate answers for financial questions. Authored by Akshay Govind Srinivasan, Ryan Jacob George, Jayden Koshy Joe, Hrushikesh Kant, Harshith M R, Sachin Sundar, Sudharshan Suresh, Rahul Vimalkanth, and Vijayavallabh from the Indian Institute of Technology Madras, this work addresses the critical need for precision and reliability in financial AI applications.

The Challenge of Financial Information Retrieval

Large Language Models (LLMs) have made remarkable progress in understanding and generating human-like text. However, a major hurdle for their use in finance is their tendency to hallucinate—producing factually incorrect information. In a domain where even small errors can have significant consequences, this is unacceptable. Retrieval-Augmented Generation (RAG) frameworks aim to solve this by grounding LLM outputs in external knowledge. Yet, conventional RAG systems, which often rely on a single database and retriever, fall short when faced with the semantic complexities and vastness of financial documents.

Financial reports often contain semantically similar passages that differ only in crucial numerical or temporal details. A standard RAG system might confuse these, leading to incorrect answers. Furthermore, complex financial queries often require multi-step reasoning and the ability to access various types of data, which a static ‘retrieve-then-generate’ approach cannot handle effectively.

Introducing Multi-HyDE and Agentic AI

The researchers propose a two-pronged solution: Multi-HyDE and an Agentic AI pipeline. Multi-HyDE is an advanced retrieval mechanism that goes beyond traditional methods. Instead of generating just one hypothetical answer or similar queries, it generates multiple, distinct, yet contextually related queries. For example, if a user asks about a company, Multi-HyDE might generate separate queries about its fraud investigations and criminal cases, even if both might be answered within the same document. This multi-perspective approach significantly boosts the effectiveness and coverage of information retrieved from large financial datasets.

This system also integrates keyword-based retrieval, like BM25, which is particularly effective for structured data such as tables and for distinguishing between semantically similar documents from different years. This hybrid strategy ensures that both the semantic meaning and exact keywords are considered, leading to more precise results.

The Agentic Pipeline: Dynamic Reasoning and Tool Use

The second core component is an Agentic AI pipeline. This means the LLM acts as an intelligent orchestrator, capable of dynamic decision-making. It doesn’t just retrieve and generate; it plans, adapts, and verifies. The process involves several stages:

Query Clarification: The system first seeks to understand the user’s question fully, even using web search if needed.
Initial Retrieval: It uses Multi-HyDE and keyword-based methods to fetch relevant content.
Iterative Refinement: If the initial results aren’t enough, the system formulates a plan, which might involve breaking down the query into sub-queries, performing multi-hop searches (looking for information across several steps), or invoking external tools.
Tool Calling: The agent can dynamically use a suite of tools, including financial data APIs (like EDGAR Tool, Alpha Vantage), web search (SERP API, Bing, DuckDuckGo), and mathematical tools (WolframAlpha API, Python calculator). This allows it to access real-time data, perform calculations, and gather information beyond its initial knowledge base.
Final Response: Once sufficient evidence is gathered and verified, the system synthesizes and delivers the final, accurate answer.

This iterative, evidence-driven process significantly reduces hallucinations and ensures that answers are grounded in verifiable sources, making the system highly reliable for diverse and complex financial queries.

Also Read:

Demonstrated Improvements and Future Outlook

The framework was evaluated on standard financial QA benchmarks, including subsets of FinanceBench and ConvFinQA datasets. The results are promising: the combined approach improved accuracy by 11.2% and reduced hallucinations by 15% compared to traditional methods. Human evaluation was emphasized to provide a more accurate assessment, especially for numerical questions where automated metrics often fall short.

The research highlights that integrating domain-specific retrieval mechanisms like Multi-HyDE with robust toolsets significantly enhances both the accuracy and reliability of answers. This modular and adaptable framework not only offers a pathway for more trustworthy AI deployment in high-stakes financial applications but also suggests that optimizing retrieval can yield greater benefits than solely developing domain-specific language models.

While the current evaluation was conducted on a relatively small dataset due to resource constraints, and human oversight is still required for complex cases, this work represents a significant step forward. Future work includes fine-tuning smaller language models for specific financial tasks and developing more nuanced evaluation metrics. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting Financial AI Accuracy with Multi-Perspective Retrieval and Intelligent Agents

The Challenge of Financial Information Retrieval

Introducing Multi-HyDE and Agentic AI

The Agentic Pipeline: Dynamic Reasoning and Tool Use

Demonstrated Improvements and Future Outlook

Gen AI News and Updates

SOCi Achieves Major Milestone with 150,000 AI Agents Automating 10 Million Local Marketing Tasks

TD Synnex Unveils Agentic AI-Powered Digital Bridge to Revolutionize Partner Sales and Productivity

Avalara Secures $500 Million Investment from BlackRock to Propel AI-Powered Tax Automation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates