TLDR: This research introduces Minimal Sufficient Pixel Sets (MPSs) using the ReX tool to understand how different image classification models “concentrate” on specific pixels to make decisions. The study of 15 models across 5 architectures reveals that models vary significantly in the size and location of these essential pixel sets, with larger models often focusing on smaller, more distinct areas. Additionally, misclassified images tend to be associated with slightly larger MPSs, suggesting that analyzing MPS characteristics can offer insights into model behavior beyond just accuracy, especially for critical applications.
In the rapidly evolving field of machine learning for image classification, choosing the right model is becoming increasingly complex. While we can statistically measure a model’s accuracy, our understanding of how these sophisticated systems actually arrive at their decisions remains limited. A recent research paper, “I Am Big, You Are Little; I Am Right, You Are Wrong”, delves into this challenge by proposing a novel approach to gain insight into the decision-making processes of various vision models.
The authors, David A Kelly, Akchunya Chanchal, and Nathan Blake from King’s College London, introduce the concept of ‘concentration’ in models. This refers to the minimal sufficient pixel sets (MPSs) that capture the essence of an image through the lens of a specific model. Unlike previous methods that might rely on fixed-size patches or arbitrary confidence thresholds, this research utilizes a tool called ReX, which identifies approximately minimal, non-patch-based sets of pixels truly sufficient to reproduce the original classification of an image.
Understanding Model Concentration
The study investigates concentration by examining the size, position, and overlap of these MPSs across 15 different image classification models. These models span five distinct architectures, ranging from smaller Inception models to state-of-the-art transformer models with over a billion parameters, all fine-tuned on ImageNet. The core research questions aimed to uncover:
- Do different models identify MPSs of varying sizes for the same image?
- Do different models focus on different locations within an image to make a classification?
- Are misclassified images associated with larger or smaller MPSs compared to correctly classified ones?
Key Findings on How Models ‘See’
The findings reveal statistically significant differences in MPS size and location both across and within different architectures. For instance, models like Inception tend to produce notably larger pixel sets, suggesting they require more visual information to make a classification. Conversely, larger models such as EVA and ConvNext often generate smaller, more spatially distinct MPSs. This ‘myopic’ tendency in large models, where they rely on very little input (as low as 5.4% for the EVA-Giant model), could potentially indicate overfitting or a high sensitivity to out-of-distribution images.
Regarding location, the study found considerable variety. The MPSs identified by different models for the same image often had low overlap, meaning they were focusing on different sets of pixels. For example, an image classified as a seashell showed that a ResNet model focused on a completely different, very small region compared to other models, which clustered around the upper part of the shell.
Perhaps one of the most intriguing findings relates to misclassifications. The research indicates a small but statistically significant increase (an average of 2.6%) in MPS size for incorrect classifications compared to correct ones. While this effect size is modest, it suggests that the ‘concentration’ pattern of a model might change when it makes an error. This could potentially serve as an additional post-classification check to assess the reliability of a model’s decision.
Also Read:
- Understanding Image Classifier Behavior Through Causal Explanations
- Unveiling AI Decisions in 3D Point Cloud Analysis Through Meaningful Segmentation
Implications for Model Selection and Safety
The research highlights that models can sometimes focus on features that are not salient to human observers. An example provided in the paper shows models classifying a ‘grey fox’ as a ‘hyena,’ with their MPSs completely missing the tail, a key distinguishing feature for humans. This underscores that even when models are highly confident in their predictions, their internal reasoning might differ significantly from human intuition.
These results emphasize the importance of considering MPS characteristics when selecting models, especially for high-stakes applications such as healthcare or autonomous navigation. Relying solely on accuracy metrics might not be sufficient. Understanding how models concentrate their attention can provide crucial insights into their robustness, potential for overfitting, and overall safety. The study suggests that analyzing MPSs could offer a valuable additional check to determine if a model’s decision falls within the expected range of correctly or incorrectly classified examples.
Future work will explore the impact of model confidence on MPS size and investigate the phenomenon of multiple distinct explanations within an image, further deepening our understanding of complex AI vision systems.


