Unveiling Gender Bias in Popular Text-to-Image AI Models

TLDR: A research paper by Zoya Hammad and Nii Longdon Sowah evaluates gender bias across four text-to-image AI models: DALL-E 3, Emu, Stable Diffusion XL, and Stable Cascade. The study found that Stable Diffusion models exhibited significant male bias in high-status professions and female bias in traditionally female roles. Emu showed more balanced results. DALL-E 3, surprisingly, displayed a female-favoring bias, likely due to backend prompt modifications aimed at increasing diversity, potentially leading to ‘over-correction.’ The research emphasizes that biases stem from training data and lack of diversity in AI development, posing a critical question about whether AI should reflect real-world demographics or aim for a 50:50 gender ratio.

Artificial Intelligence (AI) is increasingly integrated into various aspects of our daily lives, from healthcare to entertainment. As this technology advances, it becomes crucial to examine its ethical implications, particularly concerning inclusivity and fairness. A recent research paper, titled “Evaluating and comparing gender bias across four text-to-image models,” delves into this very issue, analyzing how different AI models represent gender in generated images.

Authored by Zoya Hammad and Nii Longdon Sowah, this study aimed to evaluate and compare the degree of gender bias present in four prominent text-to-image AI models: Stable Diffusion XL (SDXL), Stable Cascade (SC), DALL-E 3, and Emu. Previous research had often focused on one or two models, and lacked quantifiable comparisons. This paper addresses that gap by investigating 30 different professions and generating 50 images for each, across all four models, to provide a comprehensive and comparative analysis of gender representation.

The researchers hypothesized that older models like DALL-E and Stable Diffusion would show a noticeable bias towards men, while Emu, a newer model from Meta AI, would offer more balanced results. Their findings largely supported this, with some intriguing exceptions.

Stable Diffusion and Emu: Reflecting and Moderating Stereotypes

The study found that Stable Diffusion XL and Stable Cascade consistently exhibited a significant degree of gender bias. For high-paying or high-education professions such as CEO, pilot, scientist, doctor, and engineer, these models predominantly generated images of men, often reaching 100% male representation for roles like CEO and doctor in SDXL and SC. Conversely, for professions traditionally associated with women, such as nurse, housekeeper, and administrative assistant, the models were much more likely to generate female images, sometimes also reaching 100% female representation.

An interesting pattern emerged when comparing related professions. For instance, while “a doctor” yielded almost exclusively male images from Stable Diffusion models, “a nurse” resulted in overwhelmingly female images. Similarly, “a person cooking in the kitchen” often produced female images, but “a Chef” predominantly showed men. This trend was also observed between “a Teacher” (mostly women) and “a Professor” (mostly men), highlighting how these models reinforce societal stereotypes linked to perceived status or formality of a role.

Emu, Meta AI’s recently released model, demonstrated comparatively more balanced results, showing at least some diversity even in professions where Stable Diffusion models showed none. This suggests that developers might be actively incorporating ethical guidelines and diverse training data in newer models, possibly in response to past criticisms of AI bias.

DALL-E 3: The Case of “Over-Correction”

Perhaps the most striking finding concerned OpenAI’s DALL-E 3. Contrary to the hypothesis and previous studies, DALL-E 3 exhibited a significant bias favoring women. For 28 out of 30 professions, it generated more female images. For example, where other models produced mostly male surgeons or CEOs, DALL-E 3 generated 82% female surgeons and 78% female CEOs. This is a stark contrast to earlier reports where DALL-E showed male bias in medical professions.

The researchers observed that DALL-E achieves these results by automatically modifying user prompts at the backend, adding keywords to ensure diversity. For instance, a simple prompt like “A Doctor” might be revised to include descriptors like “South Asian in descent and a woman by gender.” While this is an attempt to correct for bias, the study raises the question of whether DALL-E is “over-correcting,” leading to a reverse gender imbalance.

Also Read:

The Root of the Bias and the Path Forward

The paper discusses that the bias in these text-to-image models likely stems not from the model architectures themselves, but from the vast datasets they are trained on. Publicly available images, often scraped from the internet, frequently underrepresent women in certain professional roles, thus perpetuating existing societal stereotypes. For example, studies show that only 38% of images for social categories found via Google Image search represent women.

Another contributing factor is the lack of gender diversity within the AI research community itself. Studies indicate that women comprise only about 26% of data and AI roles and around 13.8% of AI research paper authors. Biases from developers’ worldviews can inadvertently be instilled into algorithms, leading to unfair outcomes. The researchers suggest that ensuring diversity in AI research teams and curating comprehensive, diverse datasets are crucial steps to mitigate these biases.

This research project highlights a fundamental question for the future of AI: should AI image generation tools aim to reflect real-world demographic statistics, or should they strive for an idealized 50:50 gender ratio? The study concludes that current generative AI models are not adequately prepared to estimate appropriate gender representation. By uncovering these biases, the paper initiates an important discussion on who decides what constitutes appropriate representation and how to build fairer, more inclusive AI tools that truly reflect the diversity of the real world. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unveiling Gender Bias in Popular Text-to-Image AI Models

Stable Diffusion and Emu: Reflecting and Moderating Stereotypes

DALL-E 3: The Case of “Over-Correction”

The Root of the Bias and the Path Forward

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates