spot_img
HomeResearch & DevelopmentSafeguarding Sensitive Data in AI: Introducing DELTA for Privacy-Aware...

Safeguarding Sensitive Data in AI: Introducing DELTA for Privacy-Aware Feature Engineering

TLDR: DELTA is a two-phase AI framework designed for Privacy-Preserving Data Reprogramming (PPDR). It transforms raw data features to improve prediction accuracy for target attributes while simultaneously minimizing the ability to infer sensitive attributes. Phase I uses reinforcement learning to discover useful feature transformations. Phase II then employs a variational generative model with a disentangled latent space and regularization techniques to generate new features that retain high utility but actively suppress sensitive information. Experiments show DELTA significantly boosts predictive performance (∼9.3%) and reduces privacy leakage (∼35%) across various datasets, demonstrating a robust approach to balancing utility and privacy in AI.

In the rapidly evolving landscape of Artificial Intelligence, data is king. However, with great data comes great responsibility, especially when dealing with sensitive information. Traditional data engineering, which focuses on transforming and optimizing data features to boost AI performance, often inadvertently creates risks of privacy leakage. Imagine anonymized fitness data revealing secret military bases, as seen in the 2018 Strava heatmap incident. This highlights a critical challenge: how can we enhance AI’s predictive power without compromising individual privacy?

This is precisely the problem that a new research paper, titled DELTA: Variational Disentangled Learning for Privacy-Preserving Data Reprogramming, addresses. Authored by Arun Vignesh Malarkkan, Haoyue Bai, Anjali Kaushik, and Yanjie Fu from Arizona State University, the paper introduces a novel framework called DELTA, designed for Privacy-Preserving Data Reprogramming (PPDR). The goal of PPDR is to transform raw data features in a way that maximizes the accuracy of predicting target attributes while simultaneously minimizing the accuracy of predicting sensitive attributes.

The Dual Challenge of PPDR

Solving PPDR presents two major hurdles. First, the sheer number of possible feature transformations in high-dimensional data creates an enormous search space, making it difficult to find optimal transformations that are highly useful for downstream tasks. Second, even if useful features are found, the challenge lies in disentangling and eliminating sensitive information from these utility-oriented features to prevent privacy inference.

Introducing DELTA: A Two-Phase Solution

DELTA tackles these challenges with a sophisticated two-phase variational disentangled generative learning framework:

Phase I: Policy-Guided Feature Transformation Discovery

This initial phase focuses on intelligently exploring the vast space of feature transformations. DELTA employs a multi-agent Reinforcement Learning (RL) system. Think of it as a team of AI agents that learn to select features and mathematical operations (like addition, division, or logarithms) to construct new, transformed features. These agents are guided by an “information bottleneck” principle, which rewards transformations that are highly relevant for the target prediction task. This phase generates a comprehensive “knowledge base” of various feature transformations, each annotated with its utility score (how well it predicts the target) and privacy score (how much sensitive information it leaks).

Phase II: Privacy-Aware Generative Data Reprogramming

Leveraging the knowledge gained in Phase I, this phase is where the magic of privacy preservation happens. DELTA uses a variational autoencoder (VAE) with a specially designed latent space. This latent space is effectively split into two distinct parts: a “utility-oriented” subspace, which captures information crucial for target prediction, and a “privacy-oriented” subspace, which holds sensitive information. During the generation of new features, the system is designed to decode only from the utility-oriented embedding, actively suppressing any signals from the privacy-oriented part. This disentanglement is further enforced through advanced techniques like adversarial and causal regularization losses, which ensure that sensitive information cannot be inferred from the generated utility features.

Impressive Results and Robustness

The researchers conducted extensive experiments across eight diverse datasets, covering various tasks like regression, binary, and multi-class classification. The results are compelling: DELTA consistently improved predictive performance by an average of approximately 9.3% while simultaneously reducing privacy leakage by about 35% compared to using original dataset features. This robust performance was observed across different data modalities, dimensionalities (from low to over 10,000 features), and types of sensitive attributes (explicitly defined or randomly selected).

An ablation study further confirmed the importance of each component within DELTA, showing that removing any part, such as causal regularization or adversarial losses, led to increased privacy leakage. The framework also demonstrated excellent cross-model generalization, meaning its generated features performed well across various popular machine learning classifiers like Random Forest, Logistic Regression, and XGBoost, without compromising privacy. Furthermore, DELTA proved scalable, maintaining its privacy-utility advantages and computational efficiency even with larger datasets.

Also Read:

A Step Towards Trustworthy AI

DELTA represents a significant advancement in data-centric AI, offering a principled way to engineer features that are both highly effective for AI tasks and rigorously protective of sensitive information. By explicitly decoupling utility-driven transformation discovery from privacy-enforced generation, DELTA lays crucial groundwork for building safer, more accountable, and human-centered AI systems, especially in sensitive domains like healthcare and finance.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -