TLDR: The FinCARE framework is a new hybrid AI approach that combines statistical causal discovery algorithms, financial knowledge graphs from SEC 10-K filings, and large language model reasoning to identify true cause-and-effect relationships in finance. It significantly improves causal graph recovery (e.g., +366% F1-score for NOTEARS) and enables reliable counterfactual predictions for scenario analysis, offering a powerful tool for proactive risk management and strategic decision-making.
In the complex world of finance, understanding what truly drives market performance and portfolio outcomes is a constant challenge. Portfolio managers and risk analysts often rely on methods that identify correlations, but correlation doesn’t always mean causation. This can lead to decisions based on misleading relationships, making proactive risk management and strategic planning difficult in fast-changing markets.
A new research paper, FinCARE: Financial Causal Analysis with Reasoning and Evidence, introduces a groundbreaking hybrid framework designed to uncover the genuine cause-and-effect relationships in financial data. Developed by Alejandro Michel, Abhinav Arun, Bhaskarjit Sarmah, and Stefano Pasquali, this approach combines the strengths of statistical causal discovery algorithms with rich domain knowledge from financial knowledge graphs and the conceptual reasoning power of large language models (LLMs).
The Problem with Traditional Methods
Current financial analysis often falls short because it struggles to distinguish between factors that merely move together (correlation) and those that actually influence each other (causation). Traditional factor models, for instance, are good at describing what happened but lack the scientific basis to identify true causal drivers. While statistical methods can be powerful, they sometimes miss established financial relationships. On the other hand, knowledge graphs provide structured domain expertise but might lack empirical validation, and LLMs, despite their advanced reasoning, can sometimes generate plausible but not truly causal relationships – a phenomenon dubbed the “causal parrot” problem.
Introducing FinCARE: A Hybrid Solution
The FinCARE framework tackles these limitations by integrating three complementary sources of knowledge:
- Statistical Causal Discovery Algorithms: These are sophisticated mathematical tools that analyze data to find causal links. FinCARE enhances three main types: constraint-based (like the PC algorithm), score-based (like GES), and continuous optimization (like NOTEARS).
- Financial Knowledge Graph (KG): This is a structured network of financial information extracted from SEC 10-K filings – detailed annual reports companies submit. This KG contains explicit causal relationships, such as ‘Positively_Impacts’, ‘Negatively_Impacts’, and ‘Affects_Stock’, providing a deep well of domain-specific expertise.
- Large Language Model (LLM) Reasoning: FinCARE leverages advanced LLMs, specifically the Qwen3-235B-A22B model, to generate hypotheses about causal relationships based on its vast understanding of financial mechanisms. This acts as a conceptual reasoning engine.
How the Hybrid Framework Works
The core innovation of FinCARE lies in how it seamlessly integrates these three components. Instead of using them in isolation, the framework encodes knowledge graph constraints directly into the statistical algorithms. This means that established financial relationships from the KG can act as ‘soft priors’ or ‘hard constraints’, guiding the algorithms to prioritize theoretically sound connections and avoid implausible ones.
Similarly, the LLM’s conceptual reasoning generates proposals for new causal edges. These LLM-generated hypotheses are then fed into the statistical discovery process, acting as additional ‘soft prior beliefs’ about potential causal links. This unified approach allows for a robust comparison between knowledge derived from documents (KG), knowledge internalized by models (LLM), and their powerful combination.
Impressive Results and Reliable Predictions
The FinCARE framework was rigorously evaluated on a synthetic financial dataset representing 500 firms across 18 variables, with a known ‘ground truth’ of 29 causal edges. The results were striking:
- Enhanced Graph Recovery: The KG+LLM-enhanced methods consistently showed significant improvements across all three statistical algorithms. For instance, the F1-score (a measure of accuracy) for the PC algorithm improved by 36%, GES by 100%, and NOTEARS by a remarkable 366% compared to their unenhanced baselines. The KG+LLM-NOTEARS method achieved near-complete recovery, correctly identifying 26 out of 29 ground truth causal edges.
- Reliable Counterfactual Analysis: Beyond identifying causal structures, FinCARE enables reliable ‘what-if’ scenario analysis. For example, it can predict the impact of a major regulatory change. If a regulatory event causes a 0.4 increase in regulatory risk, FinCARE can trace its effects through various channels – depressing revenue growth, improving EBITDA margins (perhaps due to cost-cutting), and directly reducing returns due to investor aversion. The framework achieved a mean absolute error of just 0.003610 for counterfactual predictions and perfect directional accuracy for intervention effects.
Key Insights for System Design
The research also provided valuable insights into building effective AI systems for causal analysis. Ablation studies revealed that a single, focused LLM agent (like the ‘MissingEdgeDiscoverer’ module) was often as effective as more complex multi-agent approaches for causal graph recovery. Crucially, the studies showed that encoding knowledge graph information as algorithmic constraints is more effective than simply listing KG facts directly in LLM prompts.
Also Read:
- Unlocking Hidden Biases: A Causal Approach to AI Fairness Testing
- QuantEvolve: An Automated Approach to Adaptive Trading Strategies
A New Era for Financial Decision-Making
The FinCARE framework represents a significant leap forward in financial causal analysis. By combining statistical rigor with domain expertise from knowledge graphs and the conceptual power of LLMs, it provides portfolio managers and risk analysts with a robust foundation for understanding the true drivers of financial performance. This enables more proactive risk management, more informed strategic decision-making, and a deeper understanding of dynamic market environments.
Future work aims to expand the framework by integrating alternative data sources beyond 10-K filings, such as earnings calls and news sentiment, and extending it to temporal causal discovery to capture dynamic relationships and lag structures in financial data.


