spot_img
HomeResearch & DevelopmentKAHAN: A Framework for Intelligent Financial Data Narration

KAHAN: A Framework for Intelligent Financial Data Narration

TLDR: KAHAN is a knowledge-augmented hierarchical framework that uses Large Language Models (LLMs) to systematically extract insights from financial data at multiple levels (entity, pairwise, group, system) and generate coherent, high-quality narratives. It significantly outperforms existing methods in narrative quality and factuality, demonstrating practical utility for investors. The framework’s effectiveness is influenced by knowledge quality and market complexity, and it has shown successful transferability to other specialized domains like healthcare.

Financial markets are complex, and understanding them often requires transforming vast amounts of structured data into clear, natural language reports. These reports, which interpret trends, compare performances, and contextualize market movements, are crucial for investment decision-making. However, their manual creation demands significant expertise and time, highlighting a need for automated solutions.

Existing data narration systems face two main challenges: the need for multi-level analysis (extracting insights at various granularities and connecting them) and the necessity of augmenting narratives with deep domain knowledge. Traditional methods often flatten data, missing hierarchical relationships, while even advanced Large Language Models (LLMs) struggle to systematically extract multi-level insights or consistently apply relevant domain expertise.

Introducing KAHAN: A New Approach to Financial Data Narration

Researchers Yajing Yang, Tony Deng, and Min-Yen Kan have introduced KAHAN (Knowledge-Augmented Hierarchical Analysis and Narration), a novel framework designed to overcome these limitations. KAHAN systematically extracts insights from raw tabular data across entity, pairwise, group, and system levels, uniquely leveraging LLMs as domain experts to drive the analysis. The core idea is to guide LLMs through a structured analytical process, rather than simply using them as text generators.

How KAHAN Works: A Three-Stage Process

The KAHAN framework operates in three distinct stages:

1. Entity-level Analysis: This foundational stage begins with LLMs generating domain-specific analytical questions. These questions guide the creation of executable code to compute relevant metrics. After execution, the numerical results are interpreted to produce entity-level insights, complete with significance scores. This question-driven approach ensures that the analysis is contextualized and avoids common issues like hallucinations or analytical gaps.

2. Multi-level Insight Synthesis: KAHAN then synthesizes these entity-level observations into a comprehensive understanding of the dataset, progressively moving through higher levels of abstraction. Domain knowledge is integrated at each step:

  • Pairwise Analysis: Identifies relationships between entities, such as contrasting performance between technology and healthcare sectors.
  • Group Analysis: Clusters entities into conceptually related groups (e.g., index groups, sector groups) and analyzes aggregate patterns within them.
  • System-level Analysis: Synthesizes all previous insights to identify dataset-wide patterns, like overall market sector rotation or the impact of monetary policy.

3. Narrative Generation: In the final stage, the hierarchical insights are transformed into coherent narratives using domain-appropriate structures and language. This involves adhering to specific financial reporting requirements, such as section ordering and audience-appropriate terminology. The generation algorithm ensures a natural flow, balancing detailed entity-level information with broader relationship patterns and system-level observations. A significant advantage here is the reusability of cached domain knowledge for subsequent reports, enhancing efficiency.

Performance and Practical Utility

KAHAN was rigorously evaluated on the DataTales financial reporting benchmark, which includes 460 samples across 11 financial markets. It was tested against baseline approaches like Direct Prompting (DP) and Chain of Thought (CoT), using various LLMs including Llama3.1-8B-instruct, Qwen2.5-7B-instruct, and GPT-4o.

The results were compelling: KAHAN consistently outperformed both DP and CoT in narrative quality, particularly in descriptive richness and insight generation. For instance, with GPT-4o, KAHAN achieved a quality score of 8.26, a 20% improvement over DP and 25% over CoT, while maintaining an impressive 98.2% factuality. This high factuality is crucial in financial reporting, where accuracy is paramount.

Human evaluations further validated KAHAN’s practical utility. Financial traders, who require comprehensive analytical depth for decision-making, found KAHAN’s outputs most useful in 80% of cases. While analysts, who prioritize conciseness for report consolidation, showed a preference for CoT, the strong trader preference confirms KAHAN’s effectiveness for its target audience of investors seeking actionable market analysis.

Key Insights from KAHAN’s Development

The research also yielded valuable insights into factors influencing KAHAN’s effectiveness:

  • Knowledge Quality: The quality of the domain knowledge significantly impacts narrative generation. Knowledge generated by more powerful models like GPT-4o enabled smaller models like Llama3.1 to achieve higher performance, suggesting a pathway for efficient, high-quality systems through knowledge distillation.
  • Market Complexity: Hierarchical analysis proved more beneficial for simpler markets (e.g., energy markets with fewer entities) than for complex ones (e.g., equity markets with numerous entities). This suggests that the depth of hierarchical analysis should be adapted based on market complexity.
  • Cross-domain Applicability: KAHAN successfully transferred to a non-financial domain, Parkinson’s Disease gait analysis in healthcare, demonstrating its generalizability to other specialized, knowledge-intensive data narration tasks.

Also Read:

The Future of Automated Financial Reporting

KAHAN represents a significant step forward in automated data narration, offering a robust framework for transforming complex financial data into coherent, insightful, and factual narratives. Its ability to leverage LLMs as domain experts and systematically extract multi-level insights holds immense promise for financial analysts and investors alike. For more details, you can read the full research paper here.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -