Enhancing Financial Sentiment Analysis Through Probabilistic LLM Fusion

TLDR: The paper introduces the Bayesian Network LLM Fusion (BNLF) framework, which combines predictions from multiple large language models (FinBERT, RoBERTa, BERTweet) using a Bayesian Network for financial sentiment analysis. BNLF addresses challenges like LLM transparency, cost, and inconsistency by providing an interpretable, lightweight, and robust solution. It achieves significant accuracy improvements (up to 6%) across diverse financial datasets and offers insights into how different LLMs and data contexts influence sentiment predictions through probabilistic reasoning.

Large Language Models (LLMs) have revolutionized many areas, including sentiment analysis, which involves identifying and interpreting opinions in text. However, these powerful models come with their own set of challenges. They can often be opaque, making it difficult to understand how they arrive at their conclusions. Fine-tuning them for specific tasks can be costly and computationally intensive, and their performance can be inconsistent across different domains. To address these issues, researchers have proposed a novel framework called the Bayesian Network LLM Fusion (BNLF).

The BNLF framework offers a sophisticated approach to integrating predictions from multiple LLMs for sentiment analysis, particularly in the complex financial domain. Instead of relying on a single LLM, BNLF combines the strengths of three distinct models: FinBERT, RoBERTa, and BERTweet. This fusion is achieved through a probabilistic mechanism known as a Bayesian Network.

Understanding the BNLF Framework

At its core, BNLF operates as a late fusion strategy. This means it takes the individual sentiment predictions from each LLM and then combines them using a Bayesian Network. A Bayesian Network is a type of probabilistic graphical model that excels at representing systems with uncertainty and interdependence. Unlike simpler methods that might just average predictions or use majority voting, a Bayesian Network explicitly models the probabilistic relationships between the LLM predictions and the final sentiment outcome. This provides a more principled and interpretable way to fuse information.

The framework works in four main steps: First, input texts are drawn from various sources, including formal financial documents and informal social media content. Second, these texts are processed by the three chosen LLMs (FinBERT, RoBERTa, and BERTweet), each generating its own sentiment prediction. FinBERT is specialized for financial language, RoBERTa is a strong general-purpose model, and BERTweet is trained on Twitter data, making it adept at handling informal social media language. These models are chosen for their complementary coverage and efficiency, being medium-sized and practical for deployment without extensive GPU resources. Third, these individual predictions are fed into the Bayesian Network, which performs probabilistic inference. Finally, the network outputs a posterior sentiment distribution, which is then mapped to a discrete sentiment label (negative, neutral, or positive).

Also Read:

Key Advantages and Performance

The BNLF framework was rigorously evaluated across three diverse, human-annotated financial datasets: Financial PhraseBank (news-based), Twitter Financial News Sentiment (TFNS, tweets), and FIQA (financial question-answering). The results demonstrated significant improvements. BNLF achieved an accuracy of 78.6% on the combined test set, outperforming a strong external baseline (DistilRoBERTa) by approximately 5.3%. It also showed consistent gains in macro- and weighted-F1 scores, indicating balanced performance across different sentiment classes.

One of the most compelling aspects of BNLF is its enhanced interpretability and ability to perform causal reasoning. Through inference analysis, the researchers showed how the framework dynamically adjusts its sentiment predictions based on the type of corpus, even when individual LLMs provide identical inputs. For instance, with all LLMs predicting ‘negative’, the BNLF’s certainty and the balance between sentiment classes varied considerably depending on whether the text came from Financial PhraseBank or TFNS. Similarly, when LLMs disagreed, BNLF’s output shifted significantly based on the corpus type, highlighting its ability to resolve conflicting evidence in a context-aware manner.

Furthermore, an influence strength analysis revealed that FinBERT and RoBERTa had the strongest direct influence on BNLF’s final predictions, with BERTweet providing complementary signals. The corpus type also played a significant role, influencing the LLMs and indirectly affecting the BNLF’s output. This level of transparency helps users understand which models and contextual factors are most plausibly contributing to a given sentiment outcome, a critical feature for trustworthy AI systems.

In conclusion, the Bayesian Network LLM Fusion framework addresses critical challenges in applying LLMs for financial sentiment analysis. It provides a robust, interpretable, and scalable solution that leverages the complementary strengths of multiple LLMs through probabilistic reasoning. This approach not only enhances predictive performance but also offers a clearer understanding of the decision-making process, moving towards more transparent and explicable AI systems. For more details, you can refer to the full research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Financial Sentiment Analysis Through Probabilistic LLM Fusion

Understanding the BNLF Framework

Key Advantages and Performance

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates