Unveiling Gender Bias in Vision-Language Models

TLDR: A study by researchers from Togo AI Labs and Vizuara AI Labs reveals significant gender biases in Vision-Language Models (VLMs). By analyzing how VLMs associate face images with occupation and activity descriptions, the study found consistent male- or female-leaning associations across different labor categories and specific statements, with transformer-based models exhibiting slightly stronger biases. The research introduces a robust framework for measuring and understanding these embedded social stereotypes.

Vision-Language Models (VLMs) are powerful AI systems that can understand and connect both images and text, enabling capabilities like open-vocabulary recognition and zero-shot learning. However, new research highlights a critical concern: these models can inadvertently learn and amplify social stereotypes, particularly gender biases, from the vast amounts of web data they are trained on.

A recent study, titled “Vision-Language Models display a strong gender bias,” conducted by researchers from Togo AI Labs and Vizuara AI Labs, delves into this issue. The authors, Aiswarya Konavoor, Raj Abhijit Dandekar, Rajat Dandekar, and Sreedath Panat, investigated whether contrastive vision-language encoders exhibit gender-linked associations when pairing face images with phrases describing occupations and activities.

How the Study Was Conducted

The researchers assembled a dataset comprising 220 face photographs, evenly split by perceived binary gender, and 150 unique statements. These statements were categorized into six types of labor: emotional, cognitive, domestic, technical, professional roles, and physical labor. For each statement, they calculated an “association score.” This score was determined by measuring the difference in how strongly the VLM associated the statement with male faces versus female faces. A positive score indicated a stronger association with male faces, while a negative score indicated a stronger association with female faces.

To ensure the robustness of their findings, the team used bootstrap confidence intervals and a “label-swap null model.” The latter helped estimate the level of association expected if no real gender structure were present, providing a baseline to distinguish genuine biases from random noise. The study evaluated several pre-trained CLIP-style dual encoders, including transformer-based models like ViT-B/32 and ViT-L/14, and ResNet-based models like RN50 and RN101.

Key Findings: Consistent Gender Biases

The study revealed consistent gender biases across all evaluated models, with transformer-based architectures generally showing slightly stronger magnitudes of bias. All four models exhibited biases significantly greater than what would be expected by chance, with the ViT-B/32 model showing the highest ratio of observed bias to null model bias (2.00).

When examining specific labor categories, clear patterns emerged:

Female-leaning associations: Emotional labor, cognitive labor, and technical labor.
Male-leaning associations: Domestic labor, professional roles, and physical labor.

For instance, emotional labor (-0.178), cognitive labor (-0.410), and technical labor (-0.898) showed strong female associations, while domestic labor (+1.180), professional roles (+0.835), and physical labor (+0.297) tended to be male-associated. These patterns were consistent across different models, highlighting deeply embedded stereotypes.

The research also identified specific statements with strong biases. For example, the ViT-B/32 model associated males more with terms like “firefighter,” “carpenter,” and “truck driver,” and females more with “nurse,” “teacher,” and “caregiver.” Similarly, other models showed associations like “pilot,” “CEO,” and “engineer” with males, and “therapist,” “counselor,” and “librarian” with females. These findings align with real-world occupational gender imbalances, indicating that VLMs are reflecting and potentially amplifying existing societal biases.

Also Read:

Implications for AI Development

The findings underscore that architectural choices and the data used for pretraining significantly influence the direction and magnitude of learned associations in VLMs. This research provides a transparent and reproducible method to measure gender-linked associations, offering a valuable framework for practitioners to understand how an encoder’s learned geometry relates to socially relevant categories. Addressing these biases is crucial for developing more equitable and fair AI systems. You can read the full research paper here: Vision-Language Models display a strong gender bias.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unveiling Gender Bias in Vision-Language Models

How the Study Was Conducted

Key Findings: Consistent Gender Biases

Implications for AI Development

Gen AI News and Updates

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates