Unveiling Bias: How Stable Diffusion Models Perpetuate Stigma Through Skin Tone Representation

TLDR: A new study reveals that Stable Diffusion text-to-image models, particularly the latest SD XL, systematically associate stigmatized identities with darker, less red, and more homogeneous skin tones. By analyzing 93 stigmatized identities and using human perception metrics, researchers found that newer models amplify existing biases, significantly reduce skin tone diversity, and reinforce harmful racial stereotypes, despite being preferred by users for their general image quality. This highlights a critical need for AI development to prioritize fairness and ethical representation alongside technical performance.

Text-to-image generators (T2Is) have revolutionized how we create visual content, but a new study from the University of Washington reveals a concerning trend: these powerful AI models, particularly Stable Diffusion, are systematically amplifying social stereotypes, especially concerning skin tones and stigmatized identities.

The research, titled “Bias Amplification in Stable Diffusion’s Representation of Stigma Through Skin Tones and Their Homogeneity” by Kyra Wilson, Sourojit Ghosh, and Aylin Caliskan, delves into how three versions of Stable Diffusion (v1.5, v2.1, and XL) depict a comprehensive set of 93 stigmatized identities. The findings suggest that as these models advance, they increasingly associate stigmatized identities with specific, often darker and less diverse, skin tones.

The Problem with AI-Generated Skin Tones

The study highlights that T2Is are prone to producing images that perpetuate harmful social stereotypes. Recent public incidents, such as a chatbot generating historically inaccurate images of soldiers with dark skin tones or T2Is depicting a “beautiful woman” predominantly with light skin, underscore these issues. These are not just minor glitches; they represent “representational harms” where certain groups are either erased or portrayed in demeaning ways, reinforcing existing societal biases.

For individuals with stigmatized identities, these harms are particularly damaging. Skin color, being a visible and often stigmatized feature linked to health and economic outcomes, makes the analysis of its representation in T2Is crucial. Previous research has shown a default to light skin tones in earlier Stable Diffusion versions, but this new study expands the scope to include a wider range of stigmatized identities and more nuanced dimensions of skin tone beyond just lightness or darkness, such as redness and yellowness.

How the Study Was Conducted

The researchers conducted a large-scale investigation, generating 50 images for each of the 93 stigmatized identities (e.g., “a person who has psoriasis,” “a person who is transgender”) using Stable Diffusion v1.5, v2.1, and XL. They also generated images for a “No Stigma” prompt for comparison. To accurately measure skin tones, they used a computational method that identifies skin regions in images and converts their colors into a perceptually uniform space called CIE L*a*b. This allowed them to measure perceptual lightness (L*) and hue angle (h*, indicating red or yellow tones).

Crucially, the study also employed the CIEDE2000 (∆E) algorithm, a metric grounded in human perception, to quantify how differently two colors are perceived. A ∆E value of 5 or less indicates that colors are very similar or indistinguishable to the human eye. This metric allowed the researchers to directly assess the perceived diversity of skin tones in the generated images.

Key Findings: A Deep Dive into Bias

The study uncovered several significant trends:

Darker and Less Red Skin Tones: With each new version of Stable Diffusion, the skin tones of individuals with stigmatized identities became progressively darker and less red. SD XL, the latest model, produced skin tones that were 13.53% darker and 23.76% less red compared to SD v1.5. This shift indicates a stronger, and potentially more harmful, association between stigmatized identities and skin tones that are more likely to face discrimination in society.
Decreased Variability: SD XL showed a substantial reduction in the range of depicted skin tones, both in terms of lightness and yellowness, compared to earlier models and most human face datasets. This “social flattening” means that people with stigmatized identities appear more similar in AI-generated images, contradicting the idea that newer models are inherently “better” or more diverse.
Lack of Perceived Diversity: Using the human perception metric (∆E), the study found that SD XL generated images where the skin tones for stigmatized identities were often indistinguishable to human viewers. A significant majority (66.89%) of stigmatized identity images from SD XL had color differences smaller than five, meaning viewers would perceive very little diversity. This was even less diverse than images of people without stigmatized identities.
Stereotypical Alignment with Race/Ethnicity: The models, especially SD XL, reinforced common stereotypes related to racial and ethnic identities. Images of people with racial/ethnic stigmatized identities showed very little variation in skin tone. For example, multiracial individuals were often depicted with darker skin tones, aligning them with their more marginalized identity—a phenomenon known as “hypodescent.” This is the first empirical evidence of hypodescent in text-to-image outputs.

Also Read:

Implications for the Future of AI

These findings raise critical concerns about the widespread use of T2Is. While models like SD XL are preferred by users for their high-quality generations, this research demonstrates that improved technical performance does not equate to reduced bias. In fact, it suggests that model progression, especially through increased parameters and training data size, might exacerbate visual biases against marginalized groups by losing the ability to generalize from minority data points.

The representational harms identified can have far-reaching impacts, from contributing to a lack of knowledge and potential discrimination in educational or professional settings to causing low self-esteem in individuals who feel misrepresented. The study calls for a shift in how AI models are evaluated, moving beyond simple performance metrics to include sociotechnical frameworks that consider visual harm, stereotype propagation, and representational equity.

Future work will need to investigate whether these biases are a result of “model collapse”—where models trained on recursively generated data lose information about rare samples. Strategies for mitigation could include limiting the use of synthetic training data and incorporating human perception metrics like ∆E into the fine-tuning process of T2Is. This research underscores the urgent need to develop models that accurately and fairly portray all people, balancing human preferences with crucial fairness concerns. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unveiling Bias: How Stable Diffusion Models Perpetuate Stigma Through Skin Tone Representation

The Problem with AI-Generated Skin Tones

How the Study Was Conducted

Key Findings: A Deep Dive into Bias

Implications for the Future of AI

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates