Examining Firm-Level Biases in Open-Source AI Models for Investment Decisions

TLDR: A new research paper investigates representation bias in open-source Qwen Large Language Models (LLMs) when used for financial investment decisions. The study reveals that LLM confidence is significantly influenced by firm size, valuation, and free cash flow, leading to a bias towards larger, more visible companies. Confidence also varies across sectors, with Technology showing the most variability. While LLM preferences align with fundamental and technical metrics when specifically prompted, the findings emphasize the need for bias calibration and careful evaluation protocols for safe and fair financial AI deployment.

Large Language Models (LLMs) are becoming increasingly common in the financial world, helping with everything from investment research to portfolio management. However, a new study highlights a critical issue: these powerful AI models can carry significant biases that might skew investment decisions.

The research, titled Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models, delves into how open-source Qwen models reflect biases related to factors like firm size, sector, and financial characteristics. This is crucial because misrepresenting companies or industries can lead to distorted risk assessments and inefficient capital allocation.

What is Representation Bias?

Representation bias occurs when the data used to train LLMs doesn’t adequately capture the full diversity of entities. In finance, this often means models might favor large, well-known companies while overlooking or undervaluing smaller, less-publicized ones. This can have serious implications when these models are used to support high-stakes financial decisions.

The study, conducted by Fabrizio Dimino, Krati Saxena, Bhaskarjit Sarmah, and Stefano Pasquali from Domyn, is the first to systematically evaluate this type of bias in Qwen LLMs within a financial context. Their work makes several key contributions, including identifying specific financial features that drive LLM confidence biases and revealing sector-specific patterns in model preferences.

How the Study Was Conducted

To investigate these biases, the researchers developed a unique methodology. They analyzed approximately 150 U.S. equities from 2017 to 2024. The core of their approach involved a “balanced round-robin prompting method.” This meant presenting the LLMs with pairs of companies and asking them to choose which was a better investment based on various criteria. They used two different prompt variations to ensure consistency and derived firm-level confidence scores from the models’ responses.

The study focused on three main research questions:

Which firm-level characteristics most influence LLM confidence?
Are observed LLM preferences stable across different financial contexts?
Do high-confidence LLM outputs align with superior empirical financial performance?

Key Findings on LLM Confidence

The results showed clear patterns of bias. For the first question, the study found that firm scale and valuation proxies were the most robust drivers of LLM confidence. Features like market capitalization, enterprise value, shares outstanding, float shares, and free cash flow consistently increased model confidence. This suggests a representation bias favoring larger, more visible firms – attributes likely overrepresented in the models’ training data – rather than classical profitability or technical signals.

When examining cross-context stability, the researchers observed “pervasive anchoring” across sectors. The Technology sector, in particular, exhibited the highest variability in LLM confidence scores, indicating less stability across different financial contexts. Interestingly, smaller Qwen models (e.g., Qwen3-8B, Qwen2.5-7B) tended to anchor more tightly, showing less context sensitivity than their larger counterparts.

Finally, regarding alignment with empirical financial performance, the study found that when models were prompted for specific financial categories, their confidence rankings aligned most strongly with fundamental data, especially free cash flow. Technical signals, such as average trading volume, also showed moderate positive associations. Conversely, risk factors like beta tended to decrease model confidence, indicating a preference for lower-risk firms.

Also Read:

Implications for Financial AI Governance

These findings highlight that while open-source Qwen LLMs can partially internalize meaningful financial structures, they are also significantly shaped by representation biases. The paper concludes with important recommendations for model governance in financial applications:

**Calibration:** Outputs should be adjusted to reduce biases related to firm size and sector.
**Category-Specific Prompts:** When using LLM predictions for portfolio or risk decisions, employing category-specific prompts and performing post-hoc consistency checks is crucial.
**Stability Diagnostics:** Incorporating stability diagnostics, such as dispersion measures, alongside performance metrics can help monitor reliability.

The research underscores the need for continued vigilance and targeted strategies to mitigate bias, ensuring safe and fair deployment of LLMs in the high-stakes world of finance. Future work will explore debiasing pipelines and the interplay of model scale versus architecture under controlled data curation.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Examining Firm-Level Biases in Open-Source AI Models for Investment Decisions

What is Representation Bias?

How the Study Was Conducted

Key Findings on LLM Confidence

Implications for Financial AI Governance

Gen AI News and Updates

South Korea’s Kang Ha-yeon Appointed First Chair of OECD’s AIGO and GPAI

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates