Unpacking How AI Sees: New Study Challenges Texture Bias in CNNs

TLDR: A new study challenges the long-held belief that ImageNet-trained CNNs are inherently texture-biased. Using a novel feature suppression framework, researchers found that CNNs primarily rely on local shape features, a reliance that can be mitigated by modern training. The study also shows that feature reliance varies across domains: computer vision models prioritize shape, medical imaging models emphasize color, and remote sensing models depend heavily on texture.

For a long time, the prevailing idea in the field of artificial intelligence has been that Convolutional Neural Networks, or CNNs, tend to focus on textures rather than shapes when they interpret images. This contrasts with how humans typically perceive objects, where shape is often the dominant cue. This hypothesis gained significant traction from the “cue-conflict experiment” conducted by Geirhos et al., which suggested that CNNs trained on large datasets like ImageNet had an inherent bias towards texture.

However, a groundbreaking new study by Tom Burgert, Oliver Stoll, Paolo Rota, and Begüm Demir is now questioning this long-held belief. Their research introduces a fresh perspective on how these powerful AI models actually process visual information, proposing a new framework to re-evaluate feature reliance.

The original cue-conflict experiment involved creating unique images that combined the shape of one object with the texture of another. When presented with these hybrid images, CNNs frequently classified them based on the texture, while human observers consistently relied on the shape. This divergence led to the widely accepted narrative that there was a fundamental difference in visual processing between AI and human perception.

Burgert and his team, however, pointed out several limitations in this traditional cue-conflict setup. They argued that the experiment oversimplified feature reliance into a binary choice between shape and texture, potentially overlooking other crucial visual cues like color. From a methodological standpoint, the stimuli used in the experiment might have unintentionally mixed multiple features, distributed texture cues unevenly across images, and even influenced human judgments through response interfaces that favored shape. These factors, they suggested, could have distorted the conclusions about how both models and humans truly utilize visual features.

To address these issues, the researchers developed an innovative, domain-agnostic evaluation framework. Instead of forcing models to choose between shape and texture, their method quantifies feature reliance by systematically suppressing individual visual cues—shape, texture, and and color—and then measuring the resulting impact on classification performance. This approach employs direct feature-suppressing transformations, avoiding the complexities of adversarial inputs or neural style transfer, which allows for a clearer assessment of a model’s dependence on specific visual information.

Through their extensive experiments, the team discovered that CNNs are not inherently biased towards texture. Instead, they primarily rely on local shape features. For example, a standard ResNet50 model experienced significant performance drops when local shape information was removed, yet it maintained much of its accuracy when texture was suppressed. This reliance on local shape, however, can be substantially reduced and made more robust through modern training strategies and advanced architectures such as ConvNeXt and Vision Transformers (ViTs). Interestingly, models trained with vision-language supervision, like CLIP-ViT, demonstrated feature reliance patterns that most closely mirrored human behavior. This indicates that sophisticated training methods can encourage CNNs to develop representations that are more aligned with human perception.

The study further expanded its analysis to various visual domains, including computer vision (CV), medical imaging (MI), and remote sensing (RS). The findings revealed that feature reliance patterns systematically differ across these domains. Computer vision models, especially when trained on natural images, predominantly prioritize shape. Medical imaging models, conversely, showed a stronger emphasis on color, which is often a critical diagnostic indicator in medical tasks. Remote sensing models exhibited a pronounced dependence on both texture and color, reflecting the nature of aerial imagery where land cover categories are frequently defined by fine-grained surface patterns and chromatic cues.

Also Read:

These findings challenge the long-standing texture bias hypothesis, suggesting that feature reliance in deep learning models is not a fixed architectural bias but rather a flexible characteristic shaped by training objectives and the specific properties of the data domain. This new understanding opens up exciting possibilities for designing AI models that can better align with human perceptual strategies, potentially leading to more robust, interpretable, and effective systems. The code for their framework is publicly available for further research and exploration. You can read the full paper for more details here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking How AI Sees: New Study Challenges Texture Bias in CNNs

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates