TLDR: A new research paper investigates representation bias in open-source Qwen Large Language Models (LLMs) when used for financial investment decisions. The study reveals that LLM confidence is significantly influenced by firm size, valuation, and free cash flow, leading to a bias towards larger, more visible companies. Confidence also varies across sectors, with Technology showing the most variability. While LLM preferences align with fundamental and technical metrics when specifically prompted, the findings emphasize the need for bias calibration and careful evaluation protocols for safe and fair financial AI deployment.
Large Language Models (LLMs) are becoming increasingly common in the financial world, helping with everything from investment research to portfolio management. However, a new study highlights a critical issue: these powerful AI models can carry significant biases that might skew investment decisions.
The research, titled Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models, delves into how open-source Qwen models reflect biases related to factors like firm size, sector, and financial characteristics. This is crucial because misrepresenting companies or industries can lead to distorted risk assessments and inefficient capital allocation.
What is Representation Bias?
Representation bias occurs when the data used to train LLMs doesn’t adequately capture the full diversity of entities. In finance, this often means models might favor large, well-known companies while overlooking or undervaluing smaller, less-publicized ones. This can have serious implications when these models are used to support high-stakes financial decisions.
The study, conducted by Fabrizio Dimino, Krati Saxena, Bhaskarjit Sarmah, and Stefano Pasquali from Domyn, is the first to systematically evaluate this type of bias in Qwen LLMs within a financial context. Their work makes several key contributions, including identifying specific financial features that drive LLM confidence biases and revealing sector-specific patterns in model preferences.
How the Study Was Conducted
To investigate these biases, the researchers developed a unique methodology. They analyzed approximately 150 U.S. equities from 2017 to 2024. The core of their approach involved a “balanced round-robin prompting method.” This meant presenting the LLMs with pairs of companies and asking them to choose which was a better investment based on various criteria. They used two different prompt variations to ensure consistency and derived firm-level confidence scores from the models’ responses.
The study focused on three main research questions:
- Which firm-level characteristics most influence LLM confidence?
- Are observed LLM preferences stable across different financial contexts?
- Do high-confidence LLM outputs align with superior empirical financial performance?
Key Findings on LLM Confidence
The results showed clear patterns of bias. For the first question, the study found that firm scale and valuation proxies were the most robust drivers of LLM confidence. Features like market capitalization, enterprise value, shares outstanding, float shares, and free cash flow consistently increased model confidence. This suggests a representation bias favoring larger, more visible firms – attributes likely overrepresented in the models’ training data – rather than classical profitability or technical signals.
When examining cross-context stability, the researchers observed “pervasive anchoring” across sectors. The Technology sector, in particular, exhibited the highest variability in LLM confidence scores, indicating less stability across different financial contexts. Interestingly, smaller Qwen models (e.g., Qwen3-8B, Qwen2.5-7B) tended to anchor more tightly, showing less context sensitivity than their larger counterparts.
Finally, regarding alignment with empirical financial performance, the study found that when models were prompted for specific financial categories, their confidence rankings aligned most strongly with fundamental data, especially free cash flow. Technical signals, such as average trading volume, also showed moderate positive associations. Conversely, risk factors like beta tended to decrease model confidence, indicating a preference for lower-risk firms.
Also Read:
- When AI Overestimates Its Coding Skills: A Look at the Dunning-Kruger Effect in Language Models
- Unpacking Knowledge Collapse: How LLMs Shape Our Information Landscape
Implications for Financial AI Governance
These findings highlight that while open-source Qwen LLMs can partially internalize meaningful financial structures, they are also significantly shaped by representation biases. The paper concludes with important recommendations for model governance in financial applications:
- **Calibration:** Outputs should be adjusted to reduce biases related to firm size and sector.
- **Category-Specific Prompts:** When using LLM predictions for portfolio or risk decisions, employing category-specific prompts and performing post-hoc consistency checks is crucial.
- **Stability Diagnostics:** Incorporating stability diagnostics, such as dispersion measures, alongside performance metrics can help monitor reliability.
The research underscores the need for continued vigilance and targeted strategies to mitigate bias, ensuring safe and fair deployment of LLMs in the high-stakes world of finance. Future work will explore debiasing pipelines and the interplay of model scale versus architecture under controlled data curation.


