Explaining AI's Chart Analysis: A Framework for Visual Reasoning Attribution

TLDR: The RADAR framework introduces a novel approach to make Multimodal Large Language Models (MLLMs) more transparent when analyzing data visualizations like charts. It achieves this by attributing the MLLM’s reasoning process to specific regions within the charts, using bounding boxes to highlight the visual data that supports both final answers and intermediate mathematical reasoning steps. RADAR includes a new dataset and a method that significantly improves attribution accuracy and leads to stronger answer generation, paving the way for more trustworthy and interpretable AI systems for visual data analysis.

As data visualizations like charts become central to quantitative analysis and decision-making, the ability to accurately interpret them is more crucial than ever. Multimodal Large Language Models (MLLMs) have shown great promise in automating visual data analysis, from answering questions about charts to generating summaries. However, a significant challenge remains: these models often operate as “black boxes,” providing conclusions without revealing which parts of the visual data informed their decisions. This lack of transparency can hinder trust and adoption in real-world applications, especially in sensitive fields like business, medicine, and education.

A new research paper introduces RADAR, a Reasoning-Guided Attribution Framework for Explainable Visual Data Analysis, which takes a significant step towards addressing this issue. The framework aims to evaluate and enhance MLLMs’ capabilities to attribute their reasoning process by highlighting specific regions in charts and graphs that justify their answers. This makes the reasoning process transparent and verifiable for users.

The core idea behind RADAR is to identify and highlight key regions within charts using bounding boxes. This approach not only explains the final decision but also provides visibility into the intermediate mathematical reasoning steps. Previous research on attribution has largely focused on text-based or general visual question-answering, which often falls short when applied to complex mathematical chart analysis. Existing methods struggle to pinpoint relevant chart regions for intricate mathematical questions, such as comparing differences between lines across different years.

How RADAR Works

RADAR operates through a two-stage pipeline. First, given a chart, a question, and an answer, the system generates step-by-step reasoning using the InternLM-XComposer2 model. This model processes visual tokens from the chart and textual inputs, adapting to chart-specific features while maintaining strong language capabilities. Second, these generated reasoning steps, along with the original chart, question, and answer, are used to produce attribution bounding boxes. These boxes highlight the specific visual elements that correspond to both the final answer and each intermediate reasoning step.

The framework offers two distinct levels of attribution. One is Answer-Level Attribution, which involves visually linking chart elements to the final answer using bounding boxes. For example, if the answer is a calculated product, this level would highlight all the data bars contributing to that calculation. The other is Reasoning-Level Attribution, which is more granular. For mathematical questions, the path to the answer often involves multiple steps. RADAR attributes each reasoning step to relevant chart regions, creating a traceable connection between the reasoning process and the visual elements. This means that for each calculation or comparison step, the specific data points or lines used are highlighted.

A New Dataset for Explainable Chart Analysis

To enable this research, the authors contributed a semi-automatic approach to create a benchmark dataset. This dataset comprises 17,819 diverse samples, including various charts, questions, detailed reasoning steps, and attribution annotations. Derived from the ChartQA dataset, it covers line and bar chart types and a range of mathematical operations. The data curation strategy combined MLLM-generated reasoning and attribution annotations with human corrections, ensuring high quality.

The dataset includes 1,000 charts (500 line, 500 bar), leading to 2,000 question-answer pairs. Human annotators identified 3,599 reasoning steps and attributed 4,092 regions for answer-based questions and 7,128 regions for reasoning-based steps, demonstrating the complexity and detail captured.

Performance and Impact

Experimental results show that RADAR significantly improves attribution accuracy. Compared to baseline methods like GPT-4o, GPT-4v, and Claude 3.5 Sonnet, RADAR’s reasoning-guided approach improves attribution accuracy by an average of 15%. Specifically, it showed substantial improvements in Multi Box IOU scores for both answer-based (VQA) and reasoning-based (VQR) attribution tasks. For instance, automated reasoning improved VQA tasks by 446% to 504% and VQR tasks by 110% to 230% over baselines. When human-validated reasoning was incorporated, VQR task improvements soared to 268% for line charts and 405% for bar charts.

Furthermore, these enhanced attribution capabilities translate directly to stronger answer generation. The system achieved an average BERTScore of approximately 0.90, indicating a high alignment with ground truth responses. This demonstrates a synergistic relationship where better attribution leads to more accurate answers.

The framework also proved its ability to generalize and scale. When extended to pie charts, the fully automated approach achieved an average BERTScore of around 0.9 and an average Semantic Textual Similarity (STS) of approximately 0.5 for generated answers, confirming its robustness across different visualization formats.

Also Read:

Looking Ahead

While RADAR represents a significant advancement, the researchers acknowledge limitations, including the challenges of human attribution, the dependency on reasoning quality, and computational requirements. Nevertheless, this work lays a strong foundation for building more trustworthy and interpretable AI systems for mathematical reasoning tasks, enabling users to verify and understand model decisions through transparent reasoning and attribution. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Explaining AI’s Chart Analysis: A Framework for Visual Reasoning Attribution

How RADAR Works

A New Dataset for Explainable Chart Analysis

Performance and Impact

Looking Ahead

Gen AI News and Updates

Oracle Unveils ‘Ask Oracle’ Chatbot for Personalized Redwood Experience, Powered by Advanced Select AI

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

NoteEx: Visualizing Your Data Science Thoughts for Better AI Assistance

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates