Unpacking Bias in AI Healthcare: Lessons from Data Collection Practices

TLDR: A research paper by Anna Arias-Duart, Maria Eugenia Cardello, and Atia Cortés explores how biased data collection practices hinder the integration of AI in healthcare. Drawing from the AI4HealthyAging project, the study identifies historical, representation, and measurement biases related to sex, gender, age, habitat, socioeconomic status, equipment, and labeling. It provides practical recommendations, such as involving diverse teams, defining clear inclusion criteria, and evaluating data labeling, to improve fairness and robustness in clinical AI system design and data collection.

Artificial intelligence (AI) holds immense potential to revolutionize healthcare, from aiding diagnoses to informing clinical decisions. However, despite rapid advancements, the widespread adoption of AI solutions in real-world clinical settings remains surprisingly limited. A significant hurdle lies in the quality and fairness of the data used to train these AI systems, which are often compromised by biased data collection practices.

A recent research paper, “Bias by Design? How Data Practices Shape Fairness in AI Healthcare Systems,” delves into these critical issues. Authored by Anna Arias-Duart, Maria Eugenia Cardello, and Atia Cortés from the Barcelona Supercomputing Center (BSC), the paper draws insights from the AI4HealthyAging project, a national R&D initiative in Spain focused on developing AI solutions for age-related diseases. The project’s core task was to identify biases during clinical data collection across various use cases, including cardiovascular conditions, Parkinson’s disease, and hearing loss.

Understanding Bias in AI Healthcare

The term ‘bias’ in AI lacks a single, universally agreed-upon definition, but generally refers to systematic and unfair favoritism or prejudice that can lead to discriminatory outcomes. In healthcare, this means AI systems might perpetuate or even amplify existing health inequities, leading to detrimental impacts on certain individuals or groups. The authors emphasize that understanding bias as a normative issue—where outcomes are undesirable or unjust—is crucial for achieving health equity.

Biases can emerge at various stages of an AI system’s lifecycle, from the initial problem formulation and data collection to model development, system implementation, and post-deployment monitoring. This paper specifically focuses on biases arising during the crucial initial stages of data design and data collection.

Biases Identified in Practice

The researchers categorized the biases they found into three main types: historical, representation, and measurement biases, illustrating each with concrete examples from the AI4HealthyAging project:

Historical Biases: These biases stem from societal norms and systemic inequalities reflected in the data.

Sex Bias: In the Parkinson’s study, there was a lower representation of females in older age groups, reflecting biological differences in disease prevalence and mortality. Neglecting these sex-based differences can lead to models that don’t accurately capture disease nuances.
Gender Bias: Even without direct gender data, inferred gender scores can reveal biases. For instance, studies show women often receive less effective pain relief and more mental health referrals compared to men, highlighting how gender norms influence treatment and can be reinforced by biased data.

Representation Biases: These occur when certain groups are underrepresented or overrepresented in the dataset, leading to models that perform poorly for marginalized populations.

Age Bias: In studies for age-related conditions, control groups tended to be younger, while disease groups were older. This imbalance can cause models to mistakenly associate age-related features with disease presence rather than true disease markers.
Habitat Bias: Most participants came from urban areas because hospitals are typically located there. This excludes individuals from rural areas, creating a geographic bias that limits the generalizability of findings.
Socioeconomic Bias: Data collected from private hospitals, for example, primarily includes individuals from wealthier backgrounds. Similarly, differences in education levels (e.g., higher education in control groups for hearing loss studies) can introduce bias if not accounted for, potentially leading to misleading conclusions.

Measurement Biases: These biases arise from inconsistencies in how data is collected or labeled.

Equipment Bias: If data is collected using specific equipment (e.g., cochlear implants from one manufacturer), the model might be biased towards the characteristics of that device, limiting its applicability to users of other equipment.
Labeling Bias: Human judgment or institutional practices can influence data labels. An example from the hearing loss study was the initial omission of ‘homemakers’ as an occupational category, which significantly misrepresented women in the dataset until corrected.
Intersectional Bias: This occurs when multiple demographic variables interact. In an Alzheimer’s study, age and sex interactions (females being younger across diagnostic groups) could lead models to misattribute normative age-related sex differences to disease-specific changes if not properly controlled.

Recommendations for Fairer AI in Healthcare

To mitigate these biases, the paper offers practical recommendations:

Historical Bias: Involve diverse, interdisciplinary teams in planning data collection to minimize implicit biases. Collect data in aggregated or disaggregated ways as appropriate, carefully considering what metadata to include to avoid unintended harm.
Representation Bias: Define clear and balanced inclusion/exclusion criteria to ensure sample diversity. Analyze the need for intersectional benchmarks to better represent the target population. Ensure the sample size is feasible and sustainable, considering recruitment and retention challenges.
Measurement Bias: Thoroughly evaluate the data labeling process to ensure categories are clear and consistent, potentially involving interdisciplinary teams for socioeconomic variables or multiple professionals for clinical data. Crucially, consider the equipment used during data collection and the context of model deployment to prevent equipment-related biases.

Also Read:

Moving Towards Equitable AI

The authors conclude that successfully integrating AI systems into healthcare requires addressing bias not just as a technical challenge, but as a fundamental governance issue. By highlighting how various forms of bias can emerge during data collection and offering concrete recommendations, this research aims to guide future healthcare AI projects in building more equitable, effective, and socially responsible systems. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unpacking Bias in AI Healthcare: Lessons from Data Collection Practices

Understanding Bias in AI Healthcare

Biases Identified in Practice

Recommendations for Fairer AI in Healthcare

Moving Towards Equitable AI

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Get Well and RhythmX AI Unite to Form GW RhythmX, Pioneering AI-Native Healthcare Intelligence

Arya Health Secures $18.2 Million to Revolutionize Post-Acute Care Administration with AI Agents

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates