POLYCHART QA: Bridging the Multilingual Gap in Chart Understanding for AI Models

TLDR: POLYCHART QA is the first large-scale multilingual benchmark for evaluating how well AI models understand charts across different languages. It features over 22,000 charts and 26,000 question-answer pairs in 10 languages. The benchmark was created using a unique pipeline that separates chart data from rendering code, enabling flexible multilingual chart generation with rigorous quality control. Experiments revealed a significant performance disparity between English and other languages, especially low-resource ones, highlighting the need for more robust multilingual vision-language models.

Charts are a fundamental way we interpret and share data across various fields, from science to daily life. With the rise of large vision-language models (LVLMs), there has been significant progress in how machines understand and reason about these visual data representations. These advanced models can answer complex questions, summarize content, and even recreate chart images based on their data.

However, a major challenge exists: most current chart understanding benchmarks and datasets are primarily focused on English. This creates a significant barrier for global audiences and limits the applicability of these models for speakers of other languages. Leading LVLMs, for instance, might perform well on an English chart question but fail when presented with the same question in Chinese, as highlighted by the researchers.

Existing multilingual and multimodal benchmarks often focus on natural images rather than structured information like charts. While some datasets include charts, they typically involve simpler tasks like character recognition, lacking the depth required for comprehensive chart reasoning across diverse languages.

To address this critical gap, researchers Yichen Xu, Liangyu Chen, Liang Zhang, Wenxuan Wang, and Qin Jin from Renmin University of China have introduced a groundbreaking new benchmark called POLYCHART QA. This is the first large-scale multilingual chart question answering benchmark, featuring 22,606 charts and 26,151 question-answering pairs across 10 different languages.

The creation of POLYCHART QA involved a clever, decoupled pipeline. This pipeline separates the chart’s data from the code used to draw it. This innovative approach allows for flexible generation of multilingual charts by simply translating the data and reusing the existing rendering code. The team used state-of-the-art large language models for translation and implemented rigorous quality control measures to ensure that the generated multilingual charts maintain linguistic and semantic consistency.

The benchmark covers a wide range of languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese. Collectively, these languages are spoken by over 65% of the global population. POLYCHART QA includes both real-world and synthetically generated charts, providing a diverse and carefully curated resource for evaluating and advancing multilingual chart understanding.

Experiments conducted using POLYCHART QA on various LVLMs, including both open-source and closed-source models, revealed important insights. A significant performance gap was observed between English and other languages, particularly those with fewer resources and non-Latin scripts. For example, models that performed well in English often saw their accuracy drop significantly for languages like Bengali and Urdu. This highlights persistent challenges in cross-lingual alignment and visual reasoning that existing multimodal benchmarks haven’t fully captured.

The research also explored few-shot evaluation, where models are given a small number of examples to learn from. Interestingly, few-shot prompting did not consistently improve multilingual performance, suggesting that simply providing more examples might not be enough to bridge the multilingual transfer gap in current LVLMs. Furthermore, cross-lingual inference tests showed that maintaining language consistency on the question side is more crucial than on the visual side for better performance.

Also Read:

In conclusion, POLYCHART QA lays a crucial foundation for developing more globally inclusive vision-language models. While the benchmark currently covers ten major languages and focuses on question answering, its flexible data pipeline allows for future expansion to more languages and diverse chart understanding tasks like summarization or fact-checking. This work aims to promote language inclusivity and accessibility in AI technologies, helping to reduce the English dominance in AI systems and support global communities in accessing AI tools in their native languages. You can find more details about this research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

POLYCHART QA: Bridging the Multilingual Gap in Chart Understanding for AI Models

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

CrochetBench: Advancing AI’s Ability to Understand and Create Crochet Patterns

Unveiling LLM Efficiency: OckBench Introduces a New Metric Beyond Accuracy

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates