TLDR: FinMR is a new, high-quality, knowledge-intensive multimodal dataset designed to evaluate advanced financial reasoning capabilities of Multimodal Large Language Models (MLLMs) at a professional analyst’s standard. It comprises over 3,200 expertly annotated question-answer pairs across 15 financial topics, integrating mathematical reasoning, financial knowledge, and diverse visual interpretation tasks. Benchmarking reveals a significant performance gap between current MLLMs and human financial analysts, highlighting areas for improvement in image analysis, formula application, and contextual understanding, especially for open-source models.
In the rapidly evolving landscape of artificial intelligence, Multimodal Large Language Models (MLLMs) are making significant strides, combining the power of language understanding with visual interpretation. However, evaluating these advanced models in highly specialized fields like finance has been a considerable challenge due to the lack of suitable datasets. This is where FinMR comes into play, a groundbreaking new benchmark designed to rigorously assess MLLMs’ capabilities in expert-level financial reasoning.
Developed by a team of researchers from the University of Auckland and Nanyang Technological University, FinMR addresses a critical gap in the AI research community. Existing datasets often fall short, either lacking the professional depth of financial knowledge, the complexity of reasoning tasks, or the diversity of visual content necessary to truly test models against the standards of a professional financial analyst.
FinMR stands out with over 3,200 meticulously curated and expertly annotated question-answer pairs. These questions span 15 diverse financial topics, ensuring a broad coverage of the domain. What makes FinMR particularly unique is its integration of sophisticated mathematical reasoning, advanced financial knowledge, and nuanced visual interpretation tasks across various image types. This includes everything from statistical charts and time series graphs to financial tables, specialized diagrams, and even geographical maps, mirroring the complex data financial analysts encounter daily.
The creation of FinMR involved a rigorous quality control protocol, including a three-stage data curation pipeline with six annotators. This process ensured the accuracy, completeness, and clarity of the dataset, drawing questions from college-level courses and professional certification programs like the Chartered Financial Analyst (CFA) and Financial Risk Management (FRM) programs. The dataset is balanced between expertise-based questions (67%) and math-focused questions (33%), and categorized by difficulty levels (easy, medium, hard) to provide a comprehensive evaluation framework.
Initial benchmarking with leading closed-source and open-source MLLMs has already revealed significant performance disparities between these models and human financial analysts. For instance, models like Gemini-2.5-Pro and Claude-3.7-Sonnet showed promising results, with Gemini-2.5-Pro achieving the best overall performance among MLLMs. However, the results also highlighted key areas for model advancement, such as precise image analysis, accurate application of complex financial formulas, and deeper contextual financial understanding. Open-source models, in particular, demonstrated a substantial performance gap compared to their closed-source counterparts, struggling with the sophisticated multimodal reasoning tasks presented in FinMR.
The research also shed light on specific challenges, such as financial math reasoning, where models generally performed lower than in expertise-based tasks, indicating a need for stronger logical rigor and multi-step calculation capabilities. An error analysis further identified common issues, with image recognition failures accounting for a significant portion of errors, especially when dealing with domain-specific visuals that require implicit information extraction. Question misunderstanding and incorrect formula application were also prevalent, particularly in harder questions requiring cross-domain knowledge integration.
FinMR is poised to be an essential benchmark tool, pushing the boundaries of multimodal financial reasoning toward professional analyst-level competence. The dataset and code are available for researchers to explore and contribute to the advancement of MLLMs in finance. You can find more details about this research paper here: FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning.
Also Read:
- Benchmarking Financial Knowledge Graphs: Introducing FinReflectKG – EvalBench
- New Benchmark Reveals LLMs Struggle with Real-World Causal Reasoning
The authors of this significant work are Shuangyan Deng, Haizhou Peng, Jiachen Xu, Ciprian Doru Giurcăneanu, and Jiamou Liu from the University of Auckland, and Rui Mao from Nanyang Technological University.


