Assessing Fairness in AI Skin Cancer Detection with Synthetic Images

TLDR: This research explores using Generative AI (GenAI) to create synthetic skin lesion images for assessing the fairness of AI-based melanoma classifiers. By generating balanced datasets across various demographic attributes like sex, age, and skin type, the study demonstrates that synthetic data can effectively evaluate and highlight biases in existing models. The findings suggest that while synthetic data is a promising tool for fairness assessment, its reliability is highest when the AI model being evaluated was trained on a dataset similar to that used for generating the synthetic images.

Melanoma, the most dangerous form of skin cancer, is projected to see a significant increase in cases and deaths by 2040. Early detection is crucial for improving survival rates, and recent advancements in Artificial Intelligence (AI) offer promising tools for automated medical diagnostics, including smartphone-based applications for pre-screenings. These innovations can reduce physician workload, diagnostic errors, and healthcare costs, while promoting early detection.

However, as AI systems become more integrated into critical areas like medicine, their trustworthiness becomes paramount. A key aspect of this trustworthiness is fairness, especially given the emergence of regulations like the European Union’s AI Act. Ensuring fairness in AI systems requires robust evaluation datasets that are representative of diverse populations, including different sexes, ages, and races (Fitzpatrick skin types).

A significant challenge in fairness assessment is the imbalance often found in real-world datasets. For instance, while the International Skin Imaging Collaboration (ISIC) dataset is a valuable resource for skin lesion images with patient metadata, it still presents imbalances across various demographic attributes. This imbalance makes it difficult to ensure that AI models perform equally well for all groups, potentially leading to biased outcomes.

This research addresses this challenge by leveraging state-of-the-art Generative AI (GenAI), specifically the LightningDiT model, to synthesize highly realistic skin lesion images. The goal is to create balanced datasets that can be used to thoroughly assess the fairness of publicly available melanoma classifiers. The study posed two key research questions:

Can we use state-of-the-art generative image synthesis methods to obtain a balanced fairness assessment dataset?

The researchers developed a protocol that uses diffusion-based image synthesis to generate balanced cohorts of dermoscopic images, encompassing various sexes, ages, and Fitzpatrick skin types. This involves training the LightningDiT model on a large corpus of real ISIC images, extracting latent representations, and then generating new synthetic images conditioned on specific demographic attributes.

Also Read:

Can this synthetic dataset be used to reliably assess the fairness of skin lesion classifiers?

To answer this, the synthetic images were applied to three peer-reviewed, pre-trained skin lesion classification models: DeepGuide, MelaNet, and SkinLesionDensenet. The analysis quantified Demographic Parity (DP), also known as Accuracy Parity (AP), which measures the difference in accuracy between different demographic subgroups.

The methodology involved generating a substantial number of synthetic images—100 images for each combination of sex (2 types), age (8 types), skin type (7 types), and disease case (1 type), totaling 11,200 images. These images were then fed into the pre-trained melanoma detection models to evaluate their fairness across these PII attributes.

The results indicated that fairness assessment using highly realistic synthetic data is a promising direction. The synthetic images successfully allowed for the evaluation of fairness across different demographic groups. A notable finding was that the performance of models trained on different datasets varied. For example, DeepGuide, which was trained on the HAM dataset, showed slightly lower performance when evaluated with synthetic data generated from the ISIC dataset. This highlights the impact of ‘dataset shift’ – where the data used for evaluation differs from the data the model was originally trained on.

Despite this, the study proposes that this approach offers a valuable new avenue for employing synthetic data to gauge and enhance fairness in medical-imaging AI systems. It suggests that synthetic test data can first verify the robustness of pre-trained models and then evaluate their fairness levels across different PII groups. The research paper can be found here: Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis.

In conclusion, while some generated images appeared unrealistic, the synthetic image test data generation performed well using LightningDiT. The study verified that synthetic images can serve as a powerful tool for evaluating the fairness and robustness of pre-trained AI models, particularly when the generator and detector originate from the same data distribution. This method holds significant potential for creating privacy-preserving and PII-free fairness audits in medical imaging AI.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Assessing Fairness in AI Skin Cancer Detection with Synthetic Images

Can we use state-of-the-art generative image synthesis methods to obtain a balanced fairness assessment dataset?

Can this synthetic dataset be used to reliably assess the fairness of skin lesion classifiers?

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates