SASER: Unveiling Stealthy Stego Attacks on Open-Source LLMs

TLDR: Researchers have introduced SASER, the first stego attack specifically designed for open-source Large Language Models (LLMs). This attack embeds malicious payloads within LLM parameters, making it highly stealthy and effective, even against quantized models. SASER achieved a 100% attack success rate in experiments on LLaMA2-7B and ChatGLM3-6B, significantly outperforming existing stego attacks for general Deep Neural Networks. The findings highlight a critical new vulnerability in open-source LLMs and call for urgent development of robust countermeasures.

Open-source Large Language Models (LLMs) have become incredibly powerful, driving advancements in AI thanks to their collaborative and transparent nature. However, new research reveals that this very transparency, which allows full access to their code and parameters, also exposes them to a sophisticated and underexplored threat: stego attacks.

A recent paper titled “SASER: Stego attacks on open-source LLMs” by Ming Tan, Wei Li, Hu Tao, Hailong Ma, Aodi Liu, Qian Chen, and Zilong Wang introduces the first stego attack specifically designed for these advanced AI models. Traditionally, stego attacks have focused on general Deep Neural Networks (DNNs), where malicious code, known as ‘payloads,’ is concealed within model parameters and activated by ‘triggers’ to compromise user systems.

The researchers argue that open-source LLMs are even more vulnerable than general DNNs for several reasons. Their mature supply chain frameworks can accelerate the spread of malware, as seen with vulnerabilities like CVE-2024-34359. Their broader ecosystems create new attack surfaces, and complex mechanisms like quantization, used to reduce model size and improve inference speed, can inadvertently make them more susceptible to these hidden threats.

Introducing SASER: A Stealthy and Robust Attack

The proposed attack, dubbed SASER (Stego Attack on open-source LLMs with Stealthiness, Effectiveness, and Robustness), operates in three main stages:

First, the TARGET stage identifies specific parameters within the LLM that are least critical to its overall performance. This is done using a “performance-aware importance (PAI)” metric. By targeting these less important parameters, the attack minimizes any noticeable degradation in the model’s functionality, ensuring stealth.

Next, the LAUNCH stage embeds the malicious payloads into these identified parameters. SASER offers two modes for this: a “general mode” for models deployed without quantization, which uses a technique called Least Significant Bit (LSB) embedding, and a “robust mode” specifically designed to maintain payload effectiveness even when models undergo quantization. In this robust mode, SASER employs de-quantization operations to ensure the hidden payload remains intact and functional despite the model compression. Triggers, which are small pieces of code that activate the payload, are also injected into the model files during this stage.

Finally, the EXPLODE stage occurs during model deployment. When a user loads the compromised LLM, the injected trigger is activated. It then reverses the embedding process to recover the hidden payload and executes it, completing the stego attack.

Also Read:

Experimental Validation and Implications

Extensive experiments were conducted on popular open-source LLMs, LLaMA2-7B and ChatGLM3-6B. The results are striking: SASER achieved a 100% attack success rate (ASR) across all tested scenarios, including models without quantization, and crucially, models with 8-bit and 4-bit quantization. This is a significant improvement over existing stego attacks for general DNNs, which saw their ASR drop to 0% when faced with quantized models. Furthermore, SASER demonstrated superior stealth, outperforming baselines by up to 98.1% in terms of stealth rate, meaning it caused minimal performance degradation to the LLMs.

The research also found that common deployment practices like parameter-efficient fine-tuning (PEFT), such as LoRA and P-tuning, do not weaken SASER’s attack performance. This highlights the broad applicability and danger of this new attack method.

The authors emphasize the ethical considerations of their work, stating that the research aims to draw urgent attention to these vulnerabilities and encourage the development of robust defenses. They note that existing defense mechanisms, such as traditional steganalysis and parameter reconstruction methods, are currently insufficient to detect or mitigate SASER. Trigger detection tools also proved ineffective, as they often rely on static analysis and prior knowledge of triggers, which can be circumvented.

This groundbreaking research underscores a critical security gap in the rapidly evolving landscape of open-source LLMs. It serves as a strong call to action for the AI security community to investigate and develop effective countermeasures against these stealthy and potent stego attacks. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

SASER: Unveiling Stealthy Stego Attacks on Open-Source LLMs

Introducing SASER: A Stealthy and Robust Attack

Experimental Validation and Implications

Gen AI News and Updates

OneShield Achieves Landmark Registration Under Cloud Security Alliance AI Controls Matrix, Setting New Industry Standard

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates