Safeguarding Sensitive Data in AI: Introducing DELTA for Privacy-Aware Feature Engineering

TLDR: DELTA is a two-phase AI framework designed for Privacy-Preserving Data Reprogramming (PPDR). It transforms raw data features to improve prediction accuracy for target attributes while simultaneously minimizing the ability to infer sensitive attributes. Phase I uses reinforcement learning to discover useful feature transformations. Phase II then employs a variational generative model with a disentangled latent space and regularization techniques to generate new features that retain high utility but actively suppress sensitive information. Experiments show DELTA significantly boosts predictive performance (∼9.3%) and reduces privacy leakage (∼35%) across various datasets, demonstrating a robust approach to balancing utility and privacy in AI.

In the rapidly evolving landscape of Artificial Intelligence, data is king. However, with great data comes great responsibility, especially when dealing with sensitive information. Traditional data engineering, which focuses on transforming and optimizing data features to boost AI performance, often inadvertently creates risks of privacy leakage. Imagine anonymized fitness data revealing secret military bases, as seen in the 2018 Strava heatmap incident. This highlights a critical challenge: how can we enhance AI’s predictive power without compromising individual privacy?

This is precisely the problem that a new research paper, titled DELTA: Variational Disentangled Learning for Privacy-Preserving Data Reprogramming, addresses. Authored by Arun Vignesh Malarkkan, Haoyue Bai, Anjali Kaushik, and Yanjie Fu from Arizona State University, the paper introduces a novel framework called DELTA, designed for Privacy-Preserving Data Reprogramming (PPDR). The goal of PPDR is to transform raw data features in a way that maximizes the accuracy of predicting target attributes while simultaneously minimizing the accuracy of predicting sensitive attributes.

The Dual Challenge of PPDR

Solving PPDR presents two major hurdles. First, the sheer number of possible feature transformations in high-dimensional data creates an enormous search space, making it difficult to find optimal transformations that are highly useful for downstream tasks. Second, even if useful features are found, the challenge lies in disentangling and eliminating sensitive information from these utility-oriented features to prevent privacy inference.

Introducing DELTA: A Two-Phase Solution

DELTA tackles these challenges with a sophisticated two-phase variational disentangled generative learning framework:

Phase I: Policy-Guided Feature Transformation Discovery

This initial phase focuses on intelligently exploring the vast space of feature transformations. DELTA employs a multi-agent Reinforcement Learning (RL) system. Think of it as a team of AI agents that learn to select features and mathematical operations (like addition, division, or logarithms) to construct new, transformed features. These agents are guided by an “information bottleneck” principle, which rewards transformations that are highly relevant for the target prediction task. This phase generates a comprehensive “knowledge base” of various feature transformations, each annotated with its utility score (how well it predicts the target) and privacy score (how much sensitive information it leaks).

Phase II: Privacy-Aware Generative Data Reprogramming

Leveraging the knowledge gained in Phase I, this phase is where the magic of privacy preservation happens. DELTA uses a variational autoencoder (VAE) with a specially designed latent space. This latent space is effectively split into two distinct parts: a “utility-oriented” subspace, which captures information crucial for target prediction, and a “privacy-oriented” subspace, which holds sensitive information. During the generation of new features, the system is designed to decode only from the utility-oriented embedding, actively suppressing any signals from the privacy-oriented part. This disentanglement is further enforced through advanced techniques like adversarial and causal regularization losses, which ensure that sensitive information cannot be inferred from the generated utility features.

Impressive Results and Robustness

The researchers conducted extensive experiments across eight diverse datasets, covering various tasks like regression, binary, and multi-class classification. The results are compelling: DELTA consistently improved predictive performance by an average of approximately 9.3% while simultaneously reducing privacy leakage by about 35% compared to using original dataset features. This robust performance was observed across different data modalities, dimensionalities (from low to over 10,000 features), and types of sensitive attributes (explicitly defined or randomly selected).

An ablation study further confirmed the importance of each component within DELTA, showing that removing any part, such as causal regularization or adversarial losses, led to increased privacy leakage. The framework also demonstrated excellent cross-model generalization, meaning its generated features performed well across various popular machine learning classifiers like Random Forest, Logistic Regression, and XGBoost, without compromising privacy. Furthermore, DELTA proved scalable, maintaining its privacy-utility advantages and computational efficiency even with larger datasets.

Also Read:

A Step Towards Trustworthy AI

DELTA represents a significant advancement in data-centric AI, offering a principled way to engineer features that are both highly effective for AI tasks and rigorously protective of sensitive information. By explicitly decoupling utility-driven transformation discovery from privacy-enforced generation, DELTA lays crucial groundwork for building safer, more accountable, and human-centered AI systems, especially in sensitive domains like healthcare and finance.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Safeguarding Sensitive Data in AI: Introducing DELTA for Privacy-Aware Feature Engineering

The Dual Challenge of PPDR

Introducing DELTA: A Two-Phase Solution

Impressive Results and Robustness

A Step Towards Trustworthy AI

Gen AI News and Updates

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates