LeFCert: Securing Language Models Against Data Poisoning Attacks

TLDR: LeFCert is a new framework that provides provable robustness for language-empowered foundation models (LeFMs) like CLIP against poisoning attacks in few-shot learning. It integrates textual and feature embeddings with an adaptive blending mechanism and uses a ‘twofold trimmed mean prototype’ to discard outliers, offering mathematical guarantees against worst-case attacks. Variants like LeFCert-L and LeFCert-C extend this to handle imperceptible perturbations and collective attack budgets, demonstrating superior certified accuracy and computational efficiency compared to existing methods.

Language-empowered Foundation Models (LeFMs) like CLIP and GraphCLIP have become incredibly powerful tools in artificial intelligence, especially for tasks that involve understanding different types of data, such as images and text, or graphs and text. These models learn by aligning visual or graph features with textual descriptions, enabling them to perform well even with very few examples (known as few-shot learning).

However, this reliance on small, task-specific datasets, often collected from various sources, makes them vulnerable to a serious threat: poisoning attacks. In such attacks, malicious actors can subtly alter a few training examples to degrade the model’s performance or cause it to make incorrect predictions. Current defenses against these attacks often rely on empirical strategies, which means they work against known attack types but lack formal guarantees, leaving models exposed to new and sophisticated adversarial tactics.

Introducing LeFCert: A Provably Robust Solution

A new research paper, Provably Robust Adaptation for Language-Empowered Foundation Models, by Yuni Lai, Xiaoyu Xue, Linghui Shen, Yulun Wu, Gaolei Li, Song Guo, Kai Zhou, and Bin Xiao, addresses this critical vulnerability. They propose a novel model called Language-empowered Few-shot Certification (LeFCert), which is the first provably robust few-shot classifier specifically designed for LeFMs.

LeFCert’s strength lies in its ability to integrate both the visual (or graph) features from support samples and the semantic information from label text embeddings. It uses an adaptive blending mechanism that dynamically adjusts how much weight is given to textual information based on its reliability. If the support samples for a class are very close to their text label, the text information is given more importance, making the classification more accurate.

To achieve its provable robustness, LeFCert employs a clever technique called a “twofold trimmed mean prototype.” Imagine you have a set of measurements, and some are extreme outliers. The trimmed mean simply discards a certain number of the highest and lowest values before calculating the average. LeFCert applies this concept to distances in the model’s embedding space, effectively ignoring potentially poisoned or outlier samples. By doing so, it can derive mathematical upper and lower bounds for classification scores, guaranteeing that predictions remain consistent even under worst-case poisoning scenarios within a specified attack budget.

Enhanced Robustness for Complex Scenarios

The researchers further extended LeFCert with two variants to tackle more realistic and challenging attack scenarios:

LeFCert-L: This variant is designed for situations where attackers not only poison samples but also ensure their perturbations are imperceptible, constrained within a small l2-norm ball. LeFCert-L uses randomized smoothing to achieve Lipschitz continuity, ensuring that small changes in input lead to bounded changes in the model’s internal representations, thereby providing robustness under these dual constraints. An even more advanced version, LeFCert-LD, incorporates diffusion denoise smoothing to improve accuracy while maintaining robustness.
LeFCert-C: Traditional certification often evaluates each test sample independently, assuming an attacker can use their entire budget on each one. LeFCert-C, however, provides “collective certification.” It considers scenarios where an attacker has a shared poisoning budget that must be distributed across multiple samples. By analyzing the worst-case allocation of this budget, LeFCert-C offers tighter and more realistic robustness guarantees for a set of test samples.

Also Read:

Impressive Performance and Efficiency

Extensive experiments on various benchmark datasets, including image classification (CIFAR-FS, Tiered-ImageNet, CUB200-2011) and graph node classification (Cora, CiteSeer), demonstrated LeFCert’s superior performance. It consistently outperformed existing methods like KNN, DPA, and FCert in both clean accuracy (performance on unperturbed data) and certified accuracy (performance with provable robustness against attacks).

For example, on CIFAR-FS, LeFCert achieved a clean accuracy of 98% and a certified accuracy of 96% with a poisoning size of T=3, significantly outperforming FCert’s 72%. LeFCert-LD showed remarkable resilience, achieving 48% certified accuracy on Tiered-ImageNet even when T=7, a scenario where all baselines failed (0% accuracy). LeFCert-C also delivered substantial improvements in collective certification, showcasing its strength in modeling shared adversarial budgets.

Despite its advanced robustness mechanisms, LeFCert is computationally efficient, making it practical for real-world applications. It can verify multiple test samples per episode within seconds, demonstrating a strong balance between security and usability.

This research marks a significant step forward in securing language-empowered foundation models, ensuring their reliability and trustworthiness in critical applications where data integrity is paramount. By providing provable guarantees, LeFCert sets a new standard for robust few-shot learning.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

LeFCert: Securing Language Models Against Data Poisoning Attacks

Introducing LeFCert: A Provably Robust Solution

Enhanced Robustness for Complex Scenarios

Impressive Performance and Efficiency

Gen AI News and Updates

Adapting Vision-Language Models for Cell Detection in Optical Microscopy

TabDistill: Bridging Transformer Power and Neural Network Efficiency for Tabular Data

New AI Framework Improves Alzheimer’s Detection Through Handwriting Analysis

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates