spot_img
HomeResearch & DevelopmentVTarbel: A New Framework for Targeted Attacks on Vertical...

VTarbel: A New Framework for Targeted Attacks on Vertical Federated Learning Systems

TLDR: VTarbel is a novel two-stage attack framework designed to perform targeted label attacks on Vertical Federated Learning (VFL) systems, even when anomaly detectors are in place and the attacker has minimal knowledge. In the preparation stage, the attacker uses a small set of expressive samples to train an estimated anomaly detector and a surrogate VFL model. In the attack stage, these local models guide gradient-based perturbations to create malicious inputs that force misclassification into a target label while evading detection. Evaluations show VTarbel significantly outperforms existing attacks and remains effective against common defenses, revealing critical security blind spots in VFL.

Vertical Federated Learning (VFL) is a cutting-edge machine learning approach that allows multiple organizations to collaborate on training powerful AI models without directly sharing their sensitive raw data. Imagine banks wanting to build a better credit risk model, or hospitals aiming for more accurate diagnoses – VFL enables them to combine their unique data insights while keeping individual customer or patient information private. This is achieved by sharing only intermediate computational results, like feature embeddings or gradients, rather than the raw data itself.

While VFL offers significant privacy benefits, it’s not immune to security threats. One particularly concerning vulnerability is the ‘targeted label attack.’ In this scenario, a malicious participant, often a ‘passive party’ (one that contributes data but doesn’t hold the final labels), subtly alters their input data during the model’s inference phase. The goal is to force the VFL model to misclassify specific inputs into a label chosen by the attacker. For instance, in a credit risk system, an attacker might manipulate data to make a ‘high risk’ loan applicant appear ‘low risk’.

Existing methods for these targeted attacks often rely on unrealistic assumptions, such as the attacker having full access to the VFL model’s internal workings or its outputs. More importantly, they frequently overlook a crucial defense mechanism deployed in real-world VFL systems: anomaly detectors. These detectors are designed to spot unusual or suspicious data inputs, flagging them as anomalies and preventing malicious manipulations from affecting the final prediction. This oversight has rendered many previous attack strategies ineffective in practical settings.

Introducing VTarbel: A Stealthy, Minimal-Knowledge Attack

To address this critical gap, researchers have introduced VTarbel, a novel two-stage attack framework specifically designed to bypass these anomaly detectors in VFL systems, even with minimal knowledge of the system’s internal workings. VTarbel operates under a more realistic threat model, where the attacker doesn’t have access to private labels or detailed model architectures.

The ingenuity of VTarbel lies in its two distinct phases:

The Preparation Stage

In this initial phase, the attacker acts ‘honestly’ by following the standard VFL inference protocol for a small, carefully selected subset of their test data. The key here is to choose ‘highly expressive’ samples – those that best represent the overall data distribution. This selection is done using a technique called Maximum Mean Discrepancy (MMD), which helps identify samples that, when added to a set, significantly reduce the difference between that set’s distribution and the full dataset’s distribution. By submitting these samples, the attacker collects the VFL model’s predicted labels, which are then used as ‘pseudo-labels’. These pseudo-labeled samples are crucial for the attacker to locally train two vital components: an ‘estimated detector’ that mimics the defender’s anomaly detection logic, and a ‘surrogate model’ that approximates the behavior of the global VFL model. This stage is about gathering intelligence and building local tools without raising suspicion.

The Attack Stage

Once the estimated detector and surrogate model are sufficiently trained, the attacker moves to the second phase. Here, they take the remaining, un-attacked test samples and apply ‘gradient-based perturbations’. This means they subtly modify the input data using an optimization process guided by their locally trained models. The objective is twofold: first, to ensure the surrogate model predicts the modified sample as the attacker’s chosen target label with high confidence; and second, to ensure that the modified sample’s ‘anomaly score’ (as calculated by the estimated detector) remains below the detection threshold. This careful balancing act allows the attacker to craft malicious inputs that are both effective at misclassification and stealthy enough to evade the defender’s real anomaly detector. The success of this stage relies on the ‘transferability’ of adversarial examples – the idea that an input designed to fool one model can often fool another similar model.

Also Read:

Evaluation and Impact

Extensive evaluations of VTarbel were conducted across various model architectures (including MLP3, VGG13, ResNet18, and DistilBERT), seven multimodal datasets, and two types of anomaly detectors. The results were striking: VTarbel consistently outperformed four state-of-the-art baseline attacks, achieving significantly higher attack success rates (e.g., 90.39% ASR on VGG13 compared to baselines at 49.62% or lower). Crucially, VTarbel successfully evaded detection by the anomaly detectors and even proved robust against several privacy-preserving defense mechanisms, often reducing attack success by less than 81.8% at the cost of over 82.3% inference accuracy degradation for the main task.

These findings, detailed in the research paper available at https://arxiv.org/pdf/2507.14625, highlight critical security vulnerabilities in current VFL deployments. They underscore an urgent need for the development of more robust, attack-aware defenses to safeguard the integrity of collaborative AI systems.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -