VisCoder2: Advancing Multi-Language Visualization Code Generation

TLDR: VisCoder2 introduces a new framework for building advanced visualization coding agents. It comprises VisCode-Multi-679K, a large dataset of 679K executable visualization code pairs across 12 languages with multi-round correction dialogues; VisPlotBench, a comprehensive benchmark spanning 8 languages for evaluating initial generation and multi-round self-debug; and the VisCoder2 models, which significantly outperform open-source baselines and approach proprietary models like GPT-4.1, achieving an 82.4% execution pass rate with iterative self-debugging.

Large language models (LLMs) have shown great promise in generating code, including code for data visualizations. However, current systems often struggle with practical challenges such as supporting multiple programming languages, ensuring reliable code execution, and iteratively correcting errors. These limitations stem from datasets and benchmarks that typically focus on single-round code generation and a limited number of languages.

A new research paper introduces a comprehensive framework called VisCoder2, designed to overcome these hurdles. The framework consists of three key components: a vast dataset, a robust benchmark, and a family of advanced coding agents. You can read the full paper here: VISCODER2: BUILDINGMULTI-LANGUAGE VISUALIZATIONCODINGAGENTS.

VisCode-Multi-679K: A Dataset for Multi-Language Visualization

At the heart of VisCoder2 is VisCode-Multi-679K, a large-scale dataset containing 679,000 validated and executable visualization code samples. What makes this dataset unique is its multi-language coverage, spanning 12 programming languages, and its inclusion of multi-turn correction dialogues. This means the dataset not only provides examples of correct code but also shows how models can learn to revise faulty code based on execution feedback.

The dataset was built by combining code from diverse open-source repositories like the-stack-v2, svg-diagrams, and CoSyn-400K. These sources provide a mix of real-world and synthetically generated visualization code. A rigorous process of filtering, code block extraction, and runtime validation ensures that all samples are executable and produce valid visual outputs. Additionally, 66,000 multi-turn dialogues from the Code-Feedback dataset were integrated to train models in iterative debugging, a crucial skill for real-world coding agents.

VisPlotBench: A Benchmark for Comprehensive Evaluation

To systematically evaluate visualization coding agents, the researchers developed VisPlotBench. This benchmark covers eight programming languages and features 888 diverse visualization tasks. Unlike previous benchmarks that often focus on a single language or a narrow range of chart types, VisPlotBench includes imperative libraries, declarative grammars, markup-based formats, and symbolic notations across 13 visual categories, from common bars and lines to more specialized music notation and network diagrams.

VisPlotBench uses a standardized protocol: execute, render, and score. It assesses not only the initial code generation but also the model’s ability to self-debug through multiple rounds of feedback. This multi-round evaluation is vital for understanding how agents perform in iterative development workflows.

VisCoder2: A Family of Visualization Coding Agents

The researchers trained a family of multi-language visualization models, also named VisCoder2, using the VisCode-Multi-679K dataset. These models, built on Qwen2.5-Coder-Instruct backbones at various scales (up to 32B parameters), demonstrate significant improvements over existing open-source baselines. Notably, VisCoder2 approaches the performance of proprietary models like GPT-4.1.

Experiments show that VisCoder2 achieves an impressive 82.4% overall execution pass rate at the 32B scale when iterative self-debug is enabled. This iterative correction mechanism proved particularly beneficial for symbolic and compiler-dependent languages such as LilyPond, LaTeX, and Asymptote, where syntax and compilation errors are common. The ability to self-debug allows the models to resolve frequent failures and produce valid outputs, highlighting that feedback-driven refinement is a critical component for reliable multi-language visualization.

Also Read:

Key Insights and Future Directions

The research highlights two main insights: the necessity of broad multi-language coverage, especially for challenging symbolic languages, and the indispensable role of iterative refinement. Self-debug consistently delivers substantial gains across models, particularly for languages prone to structural and semantic errors.

While VisCoder2 represents a significant leap forward, the researchers acknowledge that the dataset still has some imbalances, with common ecosystems like Python and Vega-Lite being better represented than symbolic or domain-specific languages. Expanding benchmark coverage to an even broader set of visualization frameworks is also a future goal. This work lays a strong foundation for building more robust and reliable visualization coding agents that can assist in real-world data analysis and reporting workflows.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

VisCoder2: Advancing Multi-Language Visualization Code Generation

VisCode-Multi-679K: A Dataset for Multi-Language Visualization

VisPlotBench: A Benchmark for Comprehensive Evaluation

VisCoder2: A Family of Visualization Coding Agents

Key Insights and Future Directions

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates