Bridging Data Gaps for Fairer AI Systems

TLDR: This research proposes a method to evaluate AI fairness when complete demographic data is unavailable. It achieves this by combining separate, overlapping datasets (e.g., internal company data and external census data) to create synthetic, complete datasets that accurately reflect relationships between demographics and model features. Experiments show that fairness metrics derived from this synthetic data closely match those from real data, offering a practical solution for bias testing under data scarcity.

Artificial intelligence (AI) systems are increasingly used in critical areas like lending, hiring, and healthcare. With this widespread adoption comes a crucial responsibility: ensuring these systems are fair and do not perpetuate or amplify existing societal biases. Global AI regulations are emerging that emphasize the need for rigorous fairness testing and independent bias audits. However, a significant hurdle in achieving this is the difficulty in obtaining the necessary complete datasets, especially those containing sensitive demographic information, due to privacy concerns and practical challenges in industry settings.

Internal historical datasets often lack the demographic details required to identify real-world biases effectively. This research addresses the challenge of evaluating classifier fairness when complete datasets, including demographic information, are not directly accessible. The authors propose an innovative approach: leveraging separate, overlapping datasets to construct complete synthetic data. This synthetic data includes demographic information and accurately reflects the underlying relationships between protected attributes (like race or sex) and the features used by the AI model.

Imagine a bank using an AI system to assess loan applicants based on non-protected variables such as occupation and savings. To audit this system for potential discrimination against certain racial groups, the bank would need data combining loan outcomes, savings, occupation, and race. However, regulations like GDPR often restrict the collection of such protected attributes in internal datasets. This is where the new approach becomes invaluable.

The core idea is to combine information from different sources. For instance, an internal dataset might contain loan outcomes, savings, and occupation. Separately, an external dataset, like publicly available census data, might provide demographic information such as occupation and race. By identifying the overlapping variable (in this case, occupation), the research proposes methods to learn the joint distribution of all these variables. This learned distribution can then be used to generate a synthetic dataset that is complete, including the previously inaccessible demographic information.

The paper explores three main methods for estimating this joint distribution from separate datasets: Independence Given Overlap, Marginal Preservation, and Latent Naïve Bayes. These methods allow for the creation of a comprehensive synthetic dataset that can then be used to test the fairness of a pre-trained AI classifier, even if the original training data lacked demographic details.

To validate their approach, the researchers conducted experiments using three widely recognized real-world datasets: Adult, COMPAS, and German Credit. They simulated scenarios where protected attributes were isolated from other variables, with only a single overlapping variable between the datasets. The results were highly promising. The synthetic data generated showed high fidelity, meaning its distribution closely matched that of real data. Crucially, the relationship between protected attributes and outcome labels was accurately reconstructed in the synthetic datasets, which is vital for assessing group disparities.

Furthermore, fairness metrics—such as Equal Opportunity Difference, Disparate Impact, and Average Odds Difference—calculated using the synthetic data were remarkably consistent with those obtained from real data. In many cases, the synthetic data-based evaluations outperformed traditional baseline methods, even those that had access to complete real data for their generation process. This demonstrates that the proposed method offers a reliable way to evaluate AI fairness in situations where real-world data limitations make it challenging.

Also Read:

This work offers a significant step forward in overcoming data scarcity for fairness testing, enabling independent and model-agnostic evaluation of AI systems. It provides a viable substitute when real data is limited, paving the way for more robust and ethical AI development. For more technical details, you can refer to the full research paper: Beyond Internal Data: Constructing Complete Datasets for Fairness Testing.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Bridging Data Gaps for Fairer AI Systems

Gen AI News and Updates

Beyond Mirroring: How Large Language Models Invent New Social Biases

Ensuring AI Integrity: SMiLE Framework Now Handles Global Relational Properties

ALIGN: AI-Powered Geospatial Reasoning for Road Safety

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates