spot_img
HomeResearch & DevelopmentSASER: Unveiling Stealthy Stego Attacks on Open-Source LLMs

SASER: Unveiling Stealthy Stego Attacks on Open-Source LLMs

TLDR: Researchers have introduced SASER, the first stego attack specifically designed for open-source Large Language Models (LLMs). This attack embeds malicious payloads within LLM parameters, making it highly stealthy and effective, even against quantized models. SASER achieved a 100% attack success rate in experiments on LLaMA2-7B and ChatGLM3-6B, significantly outperforming existing stego attacks for general Deep Neural Networks. The findings highlight a critical new vulnerability in open-source LLMs and call for urgent development of robust countermeasures.

Open-source Large Language Models (LLMs) have become incredibly powerful, driving advancements in AI thanks to their collaborative and transparent nature. However, new research reveals that this very transparency, which allows full access to their code and parameters, also exposes them to a sophisticated and underexplored threat: stego attacks.

A recent paper titled “SASER: Stego attacks on open-source LLMs” by Ming Tan, Wei Li, Hu Tao, Hailong Ma, Aodi Liu, Qian Chen, and Zilong Wang introduces the first stego attack specifically designed for these advanced AI models. Traditionally, stego attacks have focused on general Deep Neural Networks (DNNs), where malicious code, known as ‘payloads,’ is concealed within model parameters and activated by ‘triggers’ to compromise user systems.

The researchers argue that open-source LLMs are even more vulnerable than general DNNs for several reasons. Their mature supply chain frameworks can accelerate the spread of malware, as seen with vulnerabilities like CVE-2024-34359. Their broader ecosystems create new attack surfaces, and complex mechanisms like quantization, used to reduce model size and improve inference speed, can inadvertently make them more susceptible to these hidden threats.

Introducing SASER: A Stealthy and Robust Attack

The proposed attack, dubbed SASER (Stego Attack on open-source LLMs with Stealthiness, Effectiveness, and Robustness), operates in three main stages:

First, the TARGET stage identifies specific parameters within the LLM that are least critical to its overall performance. This is done using a “performance-aware importance (PAI)” metric. By targeting these less important parameters, the attack minimizes any noticeable degradation in the model’s functionality, ensuring stealth.

Next, the LAUNCH stage embeds the malicious payloads into these identified parameters. SASER offers two modes for this: a “general mode” for models deployed without quantization, which uses a technique called Least Significant Bit (LSB) embedding, and a “robust mode” specifically designed to maintain payload effectiveness even when models undergo quantization. In this robust mode, SASER employs de-quantization operations to ensure the hidden payload remains intact and functional despite the model compression. Triggers, which are small pieces of code that activate the payload, are also injected into the model files during this stage.

Finally, the EXPLODE stage occurs during model deployment. When a user loads the compromised LLM, the injected trigger is activated. It then reverses the embedding process to recover the hidden payload and executes it, completing the stego attack.

Also Read:

Experimental Validation and Implications

Extensive experiments were conducted on popular open-source LLMs, LLaMA2-7B and ChatGLM3-6B. The results are striking: SASER achieved a 100% attack success rate (ASR) across all tested scenarios, including models without quantization, and crucially, models with 8-bit and 4-bit quantization. This is a significant improvement over existing stego attacks for general DNNs, which saw their ASR drop to 0% when faced with quantized models. Furthermore, SASER demonstrated superior stealth, outperforming baselines by up to 98.1% in terms of stealth rate, meaning it caused minimal performance degradation to the LLMs.

The research also found that common deployment practices like parameter-efficient fine-tuning (PEFT), such as LoRA and P-tuning, do not weaken SASER’s attack performance. This highlights the broad applicability and danger of this new attack method.

The authors emphasize the ethical considerations of their work, stating that the research aims to draw urgent attention to these vulnerabilities and encourage the development of robust defenses. They note that existing defense mechanisms, such as traditional steganalysis and parameter reconstruction methods, are currently insufficient to detect or mitigate SASER. Trigger detection tools also proved ineffective, as they often rely on static analysis and prior knowledge of triggers, which can be circumvented.

This groundbreaking research underscores a critical security gap in the rapidly evolving landscape of open-source LLMs. It serves as a strong call to action for the AI security community to investigate and develop effective countermeasures against these stealthy and potent stego attacks. You can read the full research paper here.

Dev Sundaram
Dev Sundaramhttps://blogs.edgentiq.com
Dev Sundaram is an investigative tech journalist with a nose for exclusives and leaks. With stints in cybersecurity and enterprise AI reporting, Dev thrives on breaking big stories—product launches, funding rounds, regulatory shifts—and giving them context. He believes journalism should push the AI industry toward transparency and accountability, especially as Generative AI becomes mainstream. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -