Unmasking a Hidden Threat: How LLM Memory Caches Can Be Corrupted

TLDR: A new research paper introduces MTIV.1, a framework for Malicious Token Injection (MTI) attacks that target the Key-Value (KV) cache in Large Language Models (LLMs) during inference. The study demonstrates that perturbing cached key vectors can reliably alter next-token distributions, degrade performance on NLP tasks (especially question answering and RAG systems), and compromise model reliability. While agentic systems showed some resilience, the attacks incur minimal runtime overhead, making them a practical threat. The research highlights cache integrity as a critical, overlooked vulnerability and calls for robust defenses.

Large Language Models (LLMs) have become the backbone of many modern AI applications, from translation to powering autonomous agents. These powerful models rely on various computational mechanisms to operate efficiently, especially during inference. One such crucial component is the Key-Value (KV) cache, which stores intermediate attention states to speed up the decoding process for long contexts. While essential for performance, a recent study highlights a critical, yet often overlooked, vulnerability within this very mechanism.

A new research paper, “Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models”, introduces a novel attack framework called MTIV.1. This framework formalizes and implements what is known as a Malicious Token Injection (MTI) attack. The core idea is that even if the prompts given to an LLM and its core parameters are secure, the temporary KV cache used during inference can be manipulated. This manipulation can silently alter how the model processes information and predicts subsequent tokens, without directly touching the input or the model’s permanent weights.

Understanding the MTIV.1 Attack

The MTIV.1 framework allows attackers to perturb cached key vectors at specific layers and timesteps within the transformer model. These perturbations can be controlled in terms of their magnitude and frequency. The paper explores three main types of corruption:

MTI-Gaussian: Injects random noise, simulating stochastic or accidental corruption.
MTI-Zeroing: Completely removes cached keys, mimicking catastrophic data erasure.
MTI-Rotation: Applies structured transformations to keys, introducing corruption while preserving their overall size.

The researchers also developed an adaptive perturbation method that uses a gradient-based approach to optimize the injected noise, aiming to maximize its impact on a target token or to diverge from the model’s normal behavior. This means attacks can be both random and highly targeted.

How Perturbations Propagate

The study includes a theoretical analysis demonstrating how these injected perturbations propagate through the attention mechanism of the transformer. Essentially, by altering the cached keys, the attack directly modifies the attention scores, which in turn biases the attention map and changes the probabilities of the next tokens the model predicts. The analysis provides mathematical bounds, showing that the deviation in the model’s output is directly linked to the magnitude of the cache corruption.

Empirical Evidence of Vulnerability

The researchers conducted extensive empirical evaluations on popular LLMs like GPT-2 and LLaMA-2/7B, using both synthetic prompts and standard NLP benchmarks. The findings were significant:

Token Distribution Shifts: MTIV.1 reliably altered next-token distributions, leading to measurable statistical divergence and degradation in Top-1 accuracy. For instance, GPT-2 medium under rotation corruption showed a 16.7% accuracy decline.
Downstream NLP Tasks: The impact varied by task. Sentiment classification (SST-2) showed moderate resilience, with accuracy drops up to 23.6% under strong Gaussian noise. However, question answering (SQuAD) was extremely sensitive, with performance plummeting by over 90% under similar conditions. This highlights that tasks requiring complex reasoning and context integration are particularly vulnerable.
Retrieval-Augmented Generation (RAG) Systems: RAG pipelines, often thought to be robust, were also affected. Post-retrieval corruption (manipulating the retrieved context) significantly reduced grounding fidelity and increased hallucination rates by 5%.
Agentic Pipelines: Surprisingly, multi-step reasoning and tool-use frameworks like ReAct and AutoGPT showed negligible degradation under the tested mild attacks. This suggests that external reasoning loops and environmental feedback might provide a corrective structure that absorbs some token-level noise. However, the researchers caution that stronger or more targeted attacks might still expose vulnerabilities.

Exploring Defenses and Practicality

The paper also explored lightweight defense strategies, including periodically clearing the cache (Cache Reset), stochastically masking key-value vectors (Dropout Mask Randomization), and applying temporal averaging to attention (Attention Smoothing). These defenses showed partial effectiveness, preserving baseline accuracy under MTIV.1 with minimal runtime cost. However, they are considered preliminary safeguards, and further research is needed for comprehensive robustness.

Crucially, the study found that MTIV.1 attacks introduce negligible to modest runtime overhead, ranging from -5% to +3%. This indicates that such cache-side threats are not just theoretical but are practically deployable in real-time systems without causing prohibitive delays.

Also Read:

Conclusion

The research identifies cache integrity as a critical, yet under-examined, vulnerability in current LLM deployments. The MTIV.1 framework establishes a reproducible threat model for future robustness research, demonstrating that even subtle manipulations of the KV cache can compromise the reliability and trustworthiness of large language models. These findings call for a renewed focus on architectural and procedural countermeasures to secure LLMs against inference-time manipulations, especially for safety-critical applications.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unmasking a Hidden Threat: How LLM Memory Caches Can Be Corrupted

Understanding the MTIV.1 Attack

How Perturbations Propagate

Empirical Evidence of Vulnerability

Exploring Defenses and Practicality

Conclusion

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

SOCi Achieves Major Milestone with 150,000 AI Agents Automating 10 Million Local Marketing Tasks

TD Synnex Unveils Agentic AI-Powered Digital Bridge to Revolutionize Partner Sales and Productivity

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates