ChartGen: A New Approach to Understanding and Generating Data Visualizations

TLDR: ChartGen is an automated pipeline that creates a massive dataset of chart images and their corresponding Python plotting code. It starts with existing chart images, uses a vision-language model to convert them into code, and then uses a large language model to iteratively augment and diversify this code. This process generated over 222,000 unique chart-code pairs, covering 27 chart types and 11 plotting libraries. The project also includes a benchmark for evaluating models on chart-to-code reconstruction, revealing that current models still have significant room for improvement in accurately reproducing charts from images.

Understanding and interpreting data visualizations, like charts, is crucial in many fields, from scientific research to business analysis. While artificial intelligence models have made strides in answering questions about charts or summarizing them, a more challenging task remains largely unexplored: chart-to-code reconstruction. This involves taking a chart image and accurately recreating the executable plotting script that generated it. This capability is vital for evaluating how well AI models can truly understand and ground visual data in a precise, machine-readable format.

To address this gap, researchers have introduced ChartGen, a fully automated pipeline designed for code-guided synthetic chart generation. ChartGen aims to significantly scale and diversify the available resources for chart understanding research.

How ChartGen Works: A Two-Stage Process

The ChartGen pipeline operates in two main stages:

1. VLM-based Chart Image Redrawing: It begins with a collection of existing chart images, referred to as ‘seed’ images. A vision-language model (VLM), specifically phi-3.5-vision-instruct, is prompted to analyze each seed image and reconstruct it into a Python plotting script. For their work, the researchers used 13,000 unique chart images from the ChartQA dataset as their initial seeds. The primary goal here isn’t perfect replication, but rather to get an initial structured code representation of the chart’s content.

2. LLM-based Chart Code Augmentation: The Python scripts generated in the first stage are then fed into a code-focused large language model (LLM), Codestral-22B-v0.1. This LLM iteratively refines and diversifies the plotting code. Instead of just altering the visual appearance of the chart, ChartGen transforms the underlying code itself. This allows for the creation of new plotting scripts and charts with varied types, styles, data distributions, and complexities. This iterative augmentation process dramatically expands the initial dataset.

The ChartGen-200K Dataset: A Comprehensive Resource

By applying this pipeline, the ChartGen project has created an impressive synthetic dataset called ChartGen-200K. This dataset comprises 222,500 unique chart image-code pairs, a substantial increase from the initial 13,000 seed images. It covers a wide array of 27 distinct chart types, ranging from common bar and line charts to more specialized visualizations like 3D plots, heatmaps, and sunburst diagrams. Furthermore, it incorporates 11 different Python visualization libraries, including popular ones like matplotlib, seaborn, and plotly, ensuring broad stylistic and layout diversity.

Beyond just image and code pairs, ChartGen-200K is enriched with additional multimodal data components. Each entry includes extracted CSV tabular data, DocTags (a compact representation for semantic and structural attributes), natural language summaries, and automatically generated question-answer (QA) pairs. This makes it a comprehensive resource for various chart understanding tasks.

Compared to previous datasets for chart-to-code research, ChartGen-200K is significantly larger and more diverse, supporting a greater number of chart types and plotting back-ends. This scale and breadth are crucial for training robust multimodal AI models.

Evaluating Chart Redrawing Capabilities

To assess how well vision-language models can perform chart redrawing, the researchers curated a dedicated evaluation set of 4,300 chart image-code pairs from the larger ChartGen-200K corpus. The task involves a model taking a chart image and producing a Python plotting script that closely matches the original, faithfully reconstructing its visual content and style.

The evaluation employs a two-pronged strategy using GPT-4o as an automated judge. It compares both the predicted code and the resulting rendered images. For code comparison, scores are given for ‘data fidelity’ (how well the underlying data values match) and ‘semantic/style consistency’ (how well chart types, orientations, labels, and colors are preserved). For image comparison, the model-generated chart is visually compared to the ground-truth image for overall similarity.

Also Read:

Current Performance and Future Outlook

The evaluation of six open-weight vision-language models (ranging from 3 billion to 26 billion parameters) on the ChartGen benchmark revealed that while models can produce syntactically valid code (indicated by moderate execution rates), accurately capturing numerical values, relationships, and stylistic elements remains a significant challenge. The best model achieved a data fidelity score of 0.58 out of 1 and an image similarity score of 7.48 out of 10, highlighting substantial room for improvement in chart-to-code reconstruction and vision-conditioned code generation.

ChartGen represents a major step forward in creating large-scale, multimodal datasets for chart understanding. By releasing the pipeline, prompts, and the dataset under an open license, the researchers aim to accelerate progress towards more robust automated chart understanding. While the pipeline is powerful, it acknowledges that it may inherit biases from its underlying AI models, pointing to future work in addressing these biases and further expanding the dataset’s scale and reasoning capabilities. For more technical details, you can refer to the full research paper: ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

ChartGen: A New Approach to Understanding and Generating Data Visualizations

How ChartGen Works: A Two-Stage Process

The ChartGen-200K Dataset: A Comprehensive Resource

Evaluating Chart Redrawing Capabilities

Current Performance and Future Outlook

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

A New Way to Disentangle Data for Scientific Exploration

Runloop.ai Launches Enterprise AI Infrastructure with Google Wallet Co-Founder Rob von Behren Joining Leadership

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates