spot_img
HomeResearch & DevelopmentMapping AI's Reasoning: A New Approach to Understanding Model...

Mapping AI’s Reasoning: A New Approach to Understanding Model Biases with Knowledge Graphs

TLDR: BAGEL is a novel framework and interactive tool that uses structured knowledge graphs to provide a global understanding of how deep neural networks process and represent high-level semantic concepts. Unlike methods that focus on individual neurons or predictions, BAGEL analyzes how concepts emerge, interact, and propagate across entire model layers, helping to identify dataset biases, model-specific biases, and spurious correlations that influence AI decision-making. It offers a visual, model-agnostic way to enhance trust and transparency in AI systems.

Deep neural networks (DNNs) have achieved remarkable success across various fields, from vision to language tasks. However, their complex, ‘black-box’ nature makes it difficult to understand how they arrive at their predictions. This lack of transparency is a significant concern, especially in critical applications like healthcare and autonomous driving, where understanding the model’s rationale is crucial for trust and usability.

Traditional Explainable Artificial Intelligence (XAI) methods often focus on providing local explanations for individual predictions. While useful, these methods can fall short in identifying broader patterns of bias and how different concepts are intertwined within the model. This is where a new framework called BAGEL comes into play, offering a global, concept-based approach to interpretability.

Introducing BAGEL: A Global View of AI Behavior

BAGEL, which stands for Bias Analysis with a Graph for global Explanation Layers, is a novel framework and interactive tool designed to extend concept-based interpretability into the realm of mechanistic interpretability. Instead of just looking at individual neurons or predictions, BAGEL provides a comprehensive dissection of a model’s behavior by analyzing how high-level semantic attributes, known as ‘concepts,’ emerge, interact, and propagate through the model’s internal components.

Concepts are defined as human-interpretable attributes such as color, texture, object parts, or scene context. For example, when classifying a ‘zebra,’ relevant concepts might include ‘striped texture’ or ‘savannah background.’ BAGEL systematically quantifies how these concepts are represented across different layers of a neural network, revealing the hidden circuits and information flow that underpin the model’s decision-making process.

How BAGEL Uncovers Biases

A key innovation of BAGEL is its visualization platform, which presents these insights in a structured knowledge graph. This graph allows users to explore relationships between concepts and classes, identify spurious correlations, and ultimately enhance the trustworthiness of AI models. The framework is designed to be model-agnostic and scalable, contributing to a deeper understanding of how deep learning models generalize—or fail to—in the presence of dataset biases.

BAGEL’s methodology involves several key steps:

  • It analyzes the presence of concept-based biases within the dataset itself. For instance, if images of ‘huskies’ in a dataset frequently appear with ‘snow,’ the model might learn a spurious association between snow and huskies.
  • It models the alignment between these dataset biases and the biases learned by the network.
  • It measures the similarity between concept associations driven by the dataset and those derived from the model’s internal representations.
  • Finally, it provides an interactive visualization tool—the knowledge graph—to organize and compare these dataset and model biases at scale.

To quantify these relationships, BAGEL uses metrics like the weighted F1-score and Jensen-Shannon (JS) Divergence. These metrics help assess how closely the model’s understanding of concepts aligns with the actual concept distribution in the dataset.

The Interactive Knowledge Graph

The knowledge graph is a central component of BAGEL. It’s a visual representation where nodes represent classes (e.g., ‘husky,’ ‘wolf’) and concepts (e.g., ‘fur,’ ‘eyes,’ ‘forest’). Directed edges connect classes to concepts, with weights indicating the probability of a concept being detected in a model’s layer given a specific class.

The edges in the graph are color-coded to highlight different types of bias:

  • Green edges indicate an overlap between dataset and model biases, meaning the model has learned a bias present in the data.
  • Blue edges show a dataset bias that the model has not fully learned.
  • Red edges signify a model-specific bias that was not present in the original dataset, potentially indicating overfitting or spurious correlations.
  • Light gray edges represent weak or irrelevant associations.

The width of an edge is proportional to the strength of the model-based concept probability, providing an immediate visual cue about the importance of a concept. This interactive graph allows users to navigate through different layers of the model, adjust thresholds, and focus on specific classes or concept categories, gaining deeper insights into the model’s learning process.

Also Read:

Experimental Validation and Impact

The researchers tested BAGEL on various deep neural networks, including AlexNet, ResNet, and DenseNet, and on diverse datasets like ‘Husky vs. Wolf,’ ‘Cats/Dogs,’ and medical datasets. Qualitative results demonstrated BAGEL’s ability to detect and visualize biases, such as a ResNet model overemphasizing ‘brown’ and ‘forest’ concepts for Redfox vs. Kitfox classification, or a DenseNet associating ‘forest,’ ‘mountain,’ and ‘wood’ with the ‘Wolf’ class due to environmental contexts in the training data.

The framework also showed how concept probabilities evolve across network layers, with early layers detecting rudimentary concepts like materials and textures, and deeper layers focusing on more complex concepts like object parts. Quantitatively, BAGEL demonstrated strong and consistent performance in detecting dataset biases compared to other methods like TCAV and Sparse Autoencoders.

In conclusion, BAGEL offers a powerful and scalable framework for understanding how deep neural networks encode, preserve, or distort concept-level biases from their training data. By providing a global perspective on model behavior and visualizing these relationships through a structured knowledge graph, BAGEL helps uncover spurious correlations and enhances the transparency and trustworthiness of AI systems. This work represents a significant step towards more accountable AI by focusing not just on what a model predicts, but how and why it does so. For more technical details, you can refer to the full research paper: Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -

Previous article
Next article