Unmasking Privacy Risks in Synthetic Tabular Data: A New Attack Method Revealed

TLDR: MIA-EPT is a novel black-box membership inference attack designed for tabular diffusion models. It identifies whether a record was used in a model’s training by analyzing prediction errors when reconstructing masked attributes from synthetic data. The attack achieved strong results in the MIDST 2025 competition, demonstrating significant privacy leakage in state-of-the-art synthetic tabular data and highlighting the need for improved privacy defenses.

Synthetic data generation has emerged as a powerful tool, especially in sensitive sectors like healthcare and finance, allowing organizations to share and utilize data while aiming to protect individual privacy. However, a critical question remains: how truly private is this synthetic data? Recent research highlights a significant vulnerability: even data generated by advanced models can inadvertently “memorize” parts of the original training data, potentially leaking sensitive information about individuals.

This concern is particularly relevant for diffusion models, a cutting-edge type of generative AI that has shown impressive capabilities in creating realistic and high-quality tabular data. While these models are celebrated for their ability to mimic complex data distributions, they are not immune to privacy risks. This is where Membership Inference Attacks (MIAs) come into play. MIAs are designed to determine whether a specific record was part of a model’s training dataset, thereby exposing potential privacy breaches.

A new black-box attack, called MIA-EPT (Membership Inference Attack via Error Prediction for Tabular Data), has been introduced to specifically target these tabular diffusion models. Developed by Eyal German, Daniel Samira, Yuval Elovici, and Asaf Shabtai, MIA-EPT operates without needing access to the internal workings of the generative model. Instead, it relies solely on the synthetic data produced by the model. The core idea behind MIA-EPT is simple yet effective: if a generative model has memorized a training record, it will be easier to predict the attributes of that record from the synthetic data it generates, leading to lower prediction errors.

How MIA-EPT Works

MIA-EPT constructs “error-based feature vectors” by masking and then reconstructing attributes (columns) of target records. It then observes how accurately these attributes are predicted. Records that were part of the original training data are expected to yield lower prediction errors, providing a signal of their “membership.” The attack follows a five-step pipeline:

1. Shadow Model Training: Auxiliary data, similar to the target model’s training data, is used to train “shadow” diffusion models. These models simulate the target’s generative process.

2. Attribute Prediction Model Training: Separate prediction models are trained on the synthetic data generated by these shadow models. Each model learns to predict a specific column’s value based on the other columns.

3. Feature Extraction (Error Profiles): For both “member” (used in training) and “non-member” (not used) records, the attribute prediction models are used to predict masked column values. The prediction errors (or accuracy for categorical data) are then aggregated into a unique “error profile” for each record.

4. Attack Classifier Training: An attack classifier is trained using these error profiles, learning to distinguish between members and non-members based on their error patterns.

5. Membership Prediction: Finally, this trained attack classifier is applied to a “challenge dataset” of unknown records to determine their membership status, providing a score indicating the likelihood of a record being part of the original training data. You can find more details about this innovative approach in the full research paper: MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data.

Also Read:

Key Findings and Implications

MIA-EPT has been rigorously validated on three state-of-the-art tabular diffusion models: TabDDPM, TabSyn, and ClavaDDPM. In internal tests, it achieved AUC-ROC scores of up to 0.599 and True Positive Rate at 10% False Positive Rate (TPR@10% FPR) values of 22.0%. Notably, under the challenging conditions of the MIDST 2025 competition, MIA-EPT secured second place in the Black-box Multi-Table track, with a TPR@10% FPR of 20.0%.

These results are significant because they demonstrate that substantial membership leakage can be uncovered in synthetic tabular data, even when the attacker has limited information (a black-box setting). This challenges the common assumption that synthetic data is inherently privacy-preserving. The success of MIA-EPT highlights a crucial trade-off: maximizing the utility and realism of synthetic data by preserving important patterns can inadvertently increase the risk of retaining traces of individual data points from the original training set.

The research emphasizes the need for organizations using diffusion-based data synthesis to rigorously evaluate their outputs for such leaks. It also motivates the development and implementation of robust privacy defenses, such as noise injection, stronger regularization techniques, or differential privacy, to better balance data utility with individual privacy in the future.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unmasking Privacy Risks in Synthetic Tabular Data: A New Attack Method Revealed

How MIA-EPT Works

Key Findings and Implications

Gen AI News and Updates

Generative AI Powers Next-Gen Autonomous Emergency Response

Cybersecurity Alarms Sound Over AI Agent ‘Query Injection’ Threats

Unlocking Hidden Memories: How LLMs Reveal Training Data When Confused

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates