VTarbel: A New Framework for Targeted Attacks on Vertical Federated Learning Systems

TLDR: VTarbel is a novel two-stage attack framework designed to perform targeted label attacks on Vertical Federated Learning (VFL) systems, even when anomaly detectors are in place and the attacker has minimal knowledge. In the preparation stage, the attacker uses a small set of expressive samples to train an estimated anomaly detector and a surrogate VFL model. In the attack stage, these local models guide gradient-based perturbations to create malicious inputs that force misclassification into a target label while evading detection. Evaluations show VTarbel significantly outperforms existing attacks and remains effective against common defenses, revealing critical security blind spots in VFL.

Vertical Federated Learning (VFL) is a cutting-edge machine learning approach that allows multiple organizations to collaborate on training powerful AI models without directly sharing their sensitive raw data. Imagine banks wanting to build a better credit risk model, or hospitals aiming for more accurate diagnoses – VFL enables them to combine their unique data insights while keeping individual customer or patient information private. This is achieved by sharing only intermediate computational results, like feature embeddings or gradients, rather than the raw data itself.

While VFL offers significant privacy benefits, it’s not immune to security threats. One particularly concerning vulnerability is the ‘targeted label attack.’ In this scenario, a malicious participant, often a ‘passive party’ (one that contributes data but doesn’t hold the final labels), subtly alters their input data during the model’s inference phase. The goal is to force the VFL model to misclassify specific inputs into a label chosen by the attacker. For instance, in a credit risk system, an attacker might manipulate data to make a ‘high risk’ loan applicant appear ‘low risk’.

Existing methods for these targeted attacks often rely on unrealistic assumptions, such as the attacker having full access to the VFL model’s internal workings or its outputs. More importantly, they frequently overlook a crucial defense mechanism deployed in real-world VFL systems: anomaly detectors. These detectors are designed to spot unusual or suspicious data inputs, flagging them as anomalies and preventing malicious manipulations from affecting the final prediction. This oversight has rendered many previous attack strategies ineffective in practical settings.

Introducing VTarbel: A Stealthy, Minimal-Knowledge Attack

To address this critical gap, researchers have introduced VTarbel, a novel two-stage attack framework specifically designed to bypass these anomaly detectors in VFL systems, even with minimal knowledge of the system’s internal workings. VTarbel operates under a more realistic threat model, where the attacker doesn’t have access to private labels or detailed model architectures.

The ingenuity of VTarbel lies in its two distinct phases:

The Preparation Stage

In this initial phase, the attacker acts ‘honestly’ by following the standard VFL inference protocol for a small, carefully selected subset of their test data. The key here is to choose ‘highly expressive’ samples – those that best represent the overall data distribution. This selection is done using a technique called Maximum Mean Discrepancy (MMD), which helps identify samples that, when added to a set, significantly reduce the difference between that set’s distribution and the full dataset’s distribution. By submitting these samples, the attacker collects the VFL model’s predicted labels, which are then used as ‘pseudo-labels’. These pseudo-labeled samples are crucial for the attacker to locally train two vital components: an ‘estimated detector’ that mimics the defender’s anomaly detection logic, and a ‘surrogate model’ that approximates the behavior of the global VFL model. This stage is about gathering intelligence and building local tools without raising suspicion.

The Attack Stage

Once the estimated detector and surrogate model are sufficiently trained, the attacker moves to the second phase. Here, they take the remaining, un-attacked test samples and apply ‘gradient-based perturbations’. This means they subtly modify the input data using an optimization process guided by their locally trained models. The objective is twofold: first, to ensure the surrogate model predicts the modified sample as the attacker’s chosen target label with high confidence; and second, to ensure that the modified sample’s ‘anomaly score’ (as calculated by the estimated detector) remains below the detection threshold. This careful balancing act allows the attacker to craft malicious inputs that are both effective at misclassification and stealthy enough to evade the defender’s real anomaly detector. The success of this stage relies on the ‘transferability’ of adversarial examples – the idea that an input designed to fool one model can often fool another similar model.

Also Read:

Evaluation and Impact

Extensive evaluations of VTarbel were conducted across various model architectures (including MLP3, VGG13, ResNet18, and DistilBERT), seven multimodal datasets, and two types of anomaly detectors. The results were striking: VTarbel consistently outperformed four state-of-the-art baseline attacks, achieving significantly higher attack success rates (e.g., 90.39% ASR on VGG13 compared to baselines at 49.62% or lower). Crucially, VTarbel successfully evaded detection by the anomaly detectors and even proved robust against several privacy-preserving defense mechanisms, often reducing attack success by less than 81.8% at the cost of over 82.3% inference accuracy degradation for the main task.

These findings, detailed in the research paper available at https://arxiv.org/pdf/2507.14625, highlight critical security vulnerabilities in current VFL deployments. They underscore an urgent need for the development of more robust, attack-aware defenses to safeguard the integrity of collaborative AI systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

VTarbel: A New Framework for Targeted Attacks on Vertical Federated Learning Systems

Introducing VTarbel: A Stealthy, Minimal-Knowledge Attack

The Preparation Stage

The Attack Stage

Evaluation and Impact

Gen AI News and Updates

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates