Examining Racial Bias in AI-Generated Stories About Women

TLDR: This study investigates racial biases in short stories about Black and white women generated by LLaMA 3.2-3B in Portuguese. Using computational clustering and qualitative discourse analysis on 2100 texts, researchers identified three main narrative types: social overcoming, ancestral mythification (predominantly for Black women), and subjective self-realization (predominantly for white women). The findings reveal that LLMs reproduce and amplify existing societal stereotypes, limiting the narrative possibilities for Black women to themes of resilience and collective action, while white women are afforded broader, more introspective roles.

Large Language Models (LLMs) are increasingly used across many domains, from content creation to education. However, their widespread adoption has raised concerns about biases embedded within these models, particularly how they represent social minorities. A recent study delves into this critical issue, specifically examining racial biases in short stories about women generated by the LLaMA 3.2-3B model in Portuguese.

The research, titled Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models, builds upon previous qualitative work by combining computational methods with detailed discourse analysis. The primary goal was to understand how LLMs construct, differentiate, and hierarchize Black and white female characters in the narratives they produce, and what underlying discourses are activated.

Methodology: Unpacking Narratives at Scale

To achieve this, the researchers generated a substantial dataset of 2100 short stories. They prompted the LLaMA 3.2-3B model with two neutral templates: one asking for a story about a Black/white woman named [name], and another simply about a Black/white woman. Names were drawn from a large dataset to ensure variety. The prompts were carefully standardized to avoid introducing any initial bias, ensuring that the only significant variable was the character’s skin color.

These stories were then encoded into numerical representations using BGE M3, a multilingual text embedding model. To identify common narrative patterns, clustering algorithms were applied to these encoded stories. The DBSCAN algorithm was chosen, which successfully identified three distinct clusters of stories, alongside an ‘outliers’ cluster. Representative stories from each cluster were then selected for an in-depth qualitative analysis by a transdisciplinary team.

Key Findings: Three Discursive Representations Emerge

The qualitative analysis revealed striking differences in how Black and white women were portrayed across the clusters:

Cluster 0, predominantly featuring stories about Black women (98%), focused on realistic narratives of social overcoming. These stories often depicted protagonists facing and conquering structural adversities like racism, poverty, or gender discrimination. Characters were portrayed as resilient figures who achieved social ascension through merit and effort, becoming doctors, teachers, or community leaders. Even when white women appeared in this cluster, their narratives often involved overcoming significant obstacles.

Cluster 1, exclusively composed of stories about Black women (100%), presented a distinct narrative tone. These were often legends, fables, or myths, rich in magical, symbolic, and religious elements. The characters were archetypal figures—queens, healers, priestesses, or warriors—endowed with supernatural powers and connected to deities, nature, or ancestral knowledge. Their leadership was not about social structure transformation but about collective salvation through ancestral cosmology, often inspired by Afro-diasporic religions.

In stark contrast, Cluster 2, with 98% of its stories about white women, centered on narratives of self-discovery, self-fulfillment, and artistic sensitivity. Protagonists embarked on subjective journeys, exploring their inner selves, searching for purpose, discovering artistic talents, or experiencing transformative love. Conflicts were internal—existential restlessness or social conformity—rather than external societal struggles. Art often served as a vehicle for personal transformation and emotional reconnection.

Lexical Contrasts: Words That Define Worlds

To further support these observations, the researchers analyzed word and adjective frequencies within the stories. For Black women, terms like “life,” “community,” “strength,” “wisdom,” “determination,” “courage,” “justice,” “leader,” and “resistance” predominated. Adjectives included “true,” “strong,” “courageous,” “wise,” “magical,” and “determined.” This vocabulary points to narratives rooted in collectivity, social action, and often mystical or ancestral power.

Conversely, stories about white women frequently used terms such as “felt,” “life,” “path,” “creativity,” “love,” “discovery,” “journey,” “soul,” and “purpose.” Adjectives like “new,” “different,” “unique,” “special,” “fascinated,” “anxious,” “confident,” and “proud” highlighted an introspective, individualized focus on self-knowledge and personal experience.

Also Read:

Discussion: Reinforcing Historical Inequalities

The study concludes that these discursive asymmetries go beyond mere stylistic differences; they reflect symbolic structures shaped by race and gender. While Black women are consistently framed through narratives of resistance, community action, or mystical knowledge, often limiting their representation to an axis of resilience, white women are afforded a broader range of individual experiences, focusing on internal quests and subjective complexity.

This unequal distribution of narrative possibilities suggests that LLMs reproduce and amplify existing societal essentialization and stereotyping. The figure of the “strong Black woman,” while empowering, can become the sole form of subjectivity, marginalizing other representations. This phenomenon aligns with concepts like “representational memory” and “white fantasies,” where dominant imaginaries delimit what Black characters can be. The research underscores that while LLMs generate grammatically coherent texts, this fluency does not guarantee ethical coherence or alignment with social justice principles, highlighting the critical need for human expertise in interpreting and mitigating bias.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Examining Racial Bias in AI-Generated Stories About Women

Methodology: Unpacking Narratives at Scale

Key Findings: Three Discursive Representations Emerge

Lexical Contrasts: Words That Define Worlds

Discussion: Reinforcing Historical Inequalities

Gen AI News and Updates

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates