Mapping AI's Reasoning: A New Approach to Understanding Model Biases with Knowledge Graphs

TLDR: BAGEL is a novel framework and interactive tool that uses structured knowledge graphs to provide a global understanding of how deep neural networks process and represent high-level semantic concepts. Unlike methods that focus on individual neurons or predictions, BAGEL analyzes how concepts emerge, interact, and propagate across entire model layers, helping to identify dataset biases, model-specific biases, and spurious correlations that influence AI decision-making. It offers a visual, model-agnostic way to enhance trust and transparency in AI systems.

Deep neural networks (DNNs) have achieved remarkable success across various fields, from vision to language tasks. However, their complex, ‘black-box’ nature makes it difficult to understand how they arrive at their predictions. This lack of transparency is a significant concern, especially in critical applications like healthcare and autonomous driving, where understanding the model’s rationale is crucial for trust and usability.

Traditional Explainable Artificial Intelligence (XAI) methods often focus on providing local explanations for individual predictions. While useful, these methods can fall short in identifying broader patterns of bias and how different concepts are intertwined within the model. This is where a new framework called BAGEL comes into play, offering a global, concept-based approach to interpretability.

Introducing BAGEL: A Global View of AI Behavior

BAGEL, which stands for Bias Analysis with a Graph for global Explanation Layers, is a novel framework and interactive tool designed to extend concept-based interpretability into the realm of mechanistic interpretability. Instead of just looking at individual neurons or predictions, BAGEL provides a comprehensive dissection of a model’s behavior by analyzing how high-level semantic attributes, known as ‘concepts,’ emerge, interact, and propagate through the model’s internal components.

Concepts are defined as human-interpretable attributes such as color, texture, object parts, or scene context. For example, when classifying a ‘zebra,’ relevant concepts might include ‘striped texture’ or ‘savannah background.’ BAGEL systematically quantifies how these concepts are represented across different layers of a neural network, revealing the hidden circuits and information flow that underpin the model’s decision-making process.

How BAGEL Uncovers Biases

A key innovation of BAGEL is its visualization platform, which presents these insights in a structured knowledge graph. This graph allows users to explore relationships between concepts and classes, identify spurious correlations, and ultimately enhance the trustworthiness of AI models. The framework is designed to be model-agnostic and scalable, contributing to a deeper understanding of how deep learning models generalize—or fail to—in the presence of dataset biases.

BAGEL’s methodology involves several key steps:

It analyzes the presence of concept-based biases within the dataset itself. For instance, if images of ‘huskies’ in a dataset frequently appear with ‘snow,’ the model might learn a spurious association between snow and huskies.
It models the alignment between these dataset biases and the biases learned by the network.
It measures the similarity between concept associations driven by the dataset and those derived from the model’s internal representations.
Finally, it provides an interactive visualization tool—the knowledge graph—to organize and compare these dataset and model biases at scale.

To quantify these relationships, BAGEL uses metrics like the weighted F1-score and Jensen-Shannon (JS) Divergence. These metrics help assess how closely the model’s understanding of concepts aligns with the actual concept distribution in the dataset.

The Interactive Knowledge Graph

The knowledge graph is a central component of BAGEL. It’s a visual representation where nodes represent classes (e.g., ‘husky,’ ‘wolf’) and concepts (e.g., ‘fur,’ ‘eyes,’ ‘forest’). Directed edges connect classes to concepts, with weights indicating the probability of a concept being detected in a model’s layer given a specific class.

The edges in the graph are color-coded to highlight different types of bias:

Green edges indicate an overlap between dataset and model biases, meaning the model has learned a bias present in the data.
Blue edges show a dataset bias that the model has not fully learned.
Red edges signify a model-specific bias that was not present in the original dataset, potentially indicating overfitting or spurious correlations.
Light gray edges represent weak or irrelevant associations.

The width of an edge is proportional to the strength of the model-based concept probability, providing an immediate visual cue about the importance of a concept. This interactive graph allows users to navigate through different layers of the model, adjust thresholds, and focus on specific classes or concept categories, gaining deeper insights into the model’s learning process.

Also Read:

Experimental Validation and Impact

The researchers tested BAGEL on various deep neural networks, including AlexNet, ResNet, and DenseNet, and on diverse datasets like ‘Husky vs. Wolf,’ ‘Cats/Dogs,’ and medical datasets. Qualitative results demonstrated BAGEL’s ability to detect and visualize biases, such as a ResNet model overemphasizing ‘brown’ and ‘forest’ concepts for Redfox vs. Kitfox classification, or a DenseNet associating ‘forest,’ ‘mountain,’ and ‘wood’ with the ‘Wolf’ class due to environmental contexts in the training data.

The framework also showed how concept probabilities evolve across network layers, with early layers detecting rudimentary concepts like materials and textures, and deeper layers focusing on more complex concepts like object parts. Quantitatively, BAGEL demonstrated strong and consistent performance in detecting dataset biases compared to other methods like TCAV and Sparse Autoencoders.

In conclusion, BAGEL offers a powerful and scalable framework for understanding how deep neural networks encode, preserve, or distort concept-level biases from their training data. By providing a global perspective on model behavior and visualizing these relationships through a structured knowledge graph, BAGEL helps uncover spurious correlations and enhances the transparency and trustworthiness of AI systems. This work represents a significant step towards more accountable AI by focusing not just on what a model predicts, but how and why it does so. For more technical details, you can refer to the full research paper: Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Mapping AI’s Reasoning: A New Approach to Understanding Model Biases with Knowledge Graphs

Introducing BAGEL: A Global View of AI Behavior

How BAGEL Uncovers Biases

The Interactive Knowledge Graph

Experimental Validation and Impact

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates