TLDR: FACE (Faithful Automatic Concept Extraction) is a new framework that improves concept-based AI explanations by ensuring the extracted concepts truly align with a deep neural network’s decision-making process. It augments Non-negative Matrix Factorization (NMF) with a Kullback-Leibler (KL) divergence regularization term, forcing predictive consistency between original and concept-based model outputs. Evaluations on ImageNet, COCO, and CelebA show FACE outperforms existing methods in faithfulness and sparsity, providing more reliable insights into why AI models make their predictions.
The world of artificial intelligence, especially deep learning, often feels like a “black box.” We see the amazing results, but understanding why a model makes a particular decision can be incredibly difficult. This lack of transparency is a major hurdle, especially in critical fields like medicine or autonomous driving, where trust and accountability are paramount.
To address this, researchers have developed “explainable AI” (XAI) methods. One promising approach is concept-based explanations. Instead of just highlighting individual pixels in an image, these methods try to explain a model’s decision using high-level, human-understandable “concepts” – like “fur,” “ears,” or “texture.” For example, an animal classifier might explain its prediction of a “rabbit” by saying it detected “fur” and “long ears.”
However, a significant challenge with existing automatic concept discovery methods is ensuring “faithfulness.” This means that the concepts extracted actually reflect the model’s true decision-making process, rather than just appearing intuitive to humans. Sometimes, what looks like a logical concept to us might not be what the neural network is actually using. If explanations aren’t faithful, they can be misleading and undermine trust.
A new framework called FACE, which stands for Faithful Automatic Concept Extraction, aims to solve this problem. Developed by Dipkamal Bhusal, Michael Clifford, Sara Rampazzi, and Nidhi Rastogi, FACE introduces a novel way to discover concepts that are both interpretable and truly aligned with how the deep learning model makes its predictions. You can find the full research paper here: FACE: Faithful Automatic Concept Extraction.
At its core, FACE builds upon a technique called Non-negative Matrix Factorization (NMF), which is good at breaking down complex data into simpler, additive components. Previous NMF-based methods focused mainly on reconstructing the internal “activations” of a neural network’s encoder (the part that processes the input). While this produced interpretable concepts, it didn’t guarantee that these concepts were actually used by the model’s final decision-making part, the “classifier head.”
FACE changes this by adding a crucial element: a Kullback-Leibler (KL) divergence regularization term. In simpler terms, this is a mathematical penalty that ensures the predictions made by the model using the original internal data are very similar to the predictions made using the concept-based reconstructed data. This “classifier supervision” during concept learning forces the discovered concepts to be predictively consistent, meaning they truly reflect the model’s reasoning.
The researchers provide theoretical backing for FACE, showing that minimizing this KL divergence helps to limit how much the model’s predictions change when using the concept-based representation. This also promotes “faithful local linearity” in the concept space, meaning that small changes in the concept representation lead to predictable changes in the model’s output, making the explanations more reliable.
To demonstrate its effectiveness, FACE was systematically evaluated on large and diverse datasets like ImageNet, COCO, and CelebA, using popular deep learning models such as ResNet-34 and MobileNetV2. The results showed that FACE consistently outperformed existing methods in terms of faithfulness and sparsity. Faithfulness was measured by how much the model’s accuracy dropped when important concepts were removed (Concept Deletion) or how quickly it recovered when they were re-inserted (Concept Insertion). Sparsity refers to how few concepts are needed to explain a decision, making the explanation simpler to understand.
For instance, in an example of classifying a “rabbit,” previous methods might highlight the “head” as the most important concept. However, FACE might identify “fur (body)” as more crucial. While “head” might seem more intuitive to a human, FACE’s finding suggests that the model actually relies more on the “fur (body)” concept for its decision, providing a more faithful insight into the model’s internal logic.
The paper also discusses the importance of tuning the regularization strength (λ) for different datasets. It was found that a moderate KL penalty significantly improves faithfulness, but too strong a penalty can sometimes degrade performance, especially on datasets with many classes. Datasets with fewer classes, like CelebA, were more robust to stronger regularization.
Also Read:
- Exploring the Hidden Logic of DINOv2’s Visual Representations
- Deep Learning Advances Multimodal Data Clustering
In summary, FACE represents a significant step forward in concept-based explainable AI. By explicitly aligning concept discovery with the model’s predictive behavior, it offers explanations that are not only human-interpretable but also truly faithful to the underlying deep neural network’s decision-making process, fostering greater trust and understanding in AI systems.


