TLDR: Chart-R1 is a new AI model that uses a novel data generation method and a two-stage training strategy (supervised fine-tuning followed by reinforcement learning) to significantly improve its ability to perform complex reasoning on charts. It creates high-quality, step-by-step reasoning data from code and uses a specialized reward system, achieving state-of-the-art performance on chart understanding benchmarks, even comparable to large proprietary models.
A new research paper introduces Chart-R1, an innovative vision-language model designed to tackle complex reasoning challenges within chart data. Inspired by recent advancements in reinforcement learning fine-tuning, Chart-R1 extends these powerful techniques beyond traditional text-based domains like mathematical reasoning and code intelligence, bringing them to the rich, multimodal world of charts.
Addressing the Chart Reasoning Gap
Charts are dense with information, yet extracting deep insights often requires more than simple data retrieval; it demands complex reasoning. Previous models, while capable of visual perception, have largely fallen short in tasks requiring multi-step thought processes to interpret chart information. Chart-R1 aims to bridge this gap by enabling advanced reasoning capabilities for chart analysis.
A Novel Approach to Data and Training
The success of Chart-R1 hinges on two key innovations: a unique programmatic data synthesis technology and a sophisticated two-stage training strategy.
Programmatic Data Synthesis: Building a Rich Dataset
One of the biggest hurdles in developing advanced chart reasoning models is the scarcity of high-quality, step-by-step reasoning data. Chart-R1 addresses this by proposing a novel method that generates data programmatically. Instead of relying on existing, often limited, datasets or lossy parsing processes, this approach starts with code. Powerful language models are prompted to generate Matplotlib plotting code, which is then used as a perfect, high-fidelity foundation. From this code, the system synthesizes complex questions, their corresponding answers, and detailed, multi-step chain-of-thought reasoning paths. To ensure diversity and realism, the data generation process incorporates real-world tables from arXiv papers. This method has led to the creation of ChartRQA, a comprehensive dataset featuring 258,000 multi-step reasoning samples, including both single- and multi-subchart scenarios, and a human-verified benchmark of 1,702 high-quality samples.
Two-Stage Training: Chart-COT and Chart-RFT
Chart-R1 employs a two-stage training strategy to build and refine its reasoning abilities:
- Chart-COT (Chain-of-Thought Supervision): In the initial phase, the model undergoes supervised fine-tuning using the step-by-step reasoning data from ChartRQA-SFT. This stage is crucial for equipping the model with the fundamental ability to break down complex chart reasoning tasks into smaller, understandable subtasks. It acts as a “cold start” to lay a strong foundation for subsequent learning.
- Chart-RFT (Reinforcement Fine-Tuning): Following Chart-COT, the model enters a reinforcement fine-tuning stage. This phase utilizes Group Relative Policy Optimization (GRPO), a method that efficiently enhances reasoning capacity without requiring a separate critic model. A key aspect of Chart-RFT is its numerically sensitive reward design. It uses distinct reward functions tailored to the answer type: a soft matching technique with a relative error tolerance for numerical answers, and edit distance for string-based answers. This ensures that the model is precisely rewarded for accuracy in both types of responses. Importantly, distinct datasets are used for the SFT and RL stages to prevent overfitting and encourage the model’s exploration ability.
Also Read:
- Advancing AI’s Spatial Understanding: New Strategies for Vision-Language Models
- Building Specialized AI Expertise: A Knowledge Graph Approach to Domain-Specific Superintelligence
Impressive Performance
Extensive experiments conducted on various open-source benchmarks, including ChartQA, CharXiv-RQ, ChartQAPro, and the newly introduced ChartRQA, demonstrate Chart-R1’s significant advantages. The model establishes a new state-of-the-art performance among small-scale vision-language models (under 20 billion parameters) and even achieves results comparable to large-scale proprietary models like GPT-4o and Claude-3.5. This strong performance, particularly on complex reasoning benchmarks, highlights the effectiveness of Chart-R1’s data generation and training methodologies.
The code and dataset for Chart-R1 are planned to be made publicly available, fostering further research and development in the field of chart reasoning. For more details, you can refer to the full research paper: Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner.


