TLDR: A study on GPT-OSS-20B reveals that agentic AI systems have distinct vulnerabilities compared to standalone models. It introduces “agentic-only” vulnerabilities, showing that model-level safety evaluations are insufficient. The research highlights that vulnerabilities are context-dependent, especially in tool-calling scenarios, and are semantic rather than related to input length. It emphasizes the need for dedicated, deployment-aware security frameworks for agentic AI.
As artificial intelligence systems become more sophisticated, moving beyond simple text generation to complex “agentic” systems that can interact with tools and environments, new security challenges are emerging. A recent study titled “Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B” explores these unique vulnerabilities, highlighting a critical difference between how we assess the safety of standalone AI models versus full-fledged agentic deployments.
The research, conducted by Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, and Philip Treleaven from University College London and Holistic AI, reveals that evaluating an AI model in isolation doesn’t fully capture the risks when that model is part of a dynamic, interactive agentic system. They used an open-source model, GPT-OSS-20B, and an observability framework called AgentSeer to conduct a detailed “red teaming” analysis. Red teaming involves intentionally trying to find flaws and vulnerabilities in a system, much like a security test.
Understanding the Vulnerability Gap
The core finding is that vulnerabilities at the model level often behave differently, or even disappear, when the model is integrated into an agentic loop. Conversely, new types of vulnerabilities, termed “agentic-only” vulnerabilities, emerge exclusively within these agentic contexts. For instance, attacks that failed completely against the standalone GPT-OSS-20B model could successfully compromise it when it was operating as part of an agent. This suggests that the way an AI interacts with its environment, its tools, and its memory significantly changes its security profile.
The study found that iterative attacks, which refine malicious prompts over several attempts, were successful against the standalone GPT-OSS-20B model in about 39.47% of cases. However, when these same attacks were transferred to agentic deployments, their effectiveness varied. Human message injection, where malicious prompts are disguised as user input, proved to be the most effective attack vector in agentic systems, achieving an average success rate of 57%. Attacks injected via tool messages, however, had a lower success rate of 40%.
The Role of AgentSeer
To conduct this in-depth analysis, the researchers introduced AgentSeer, an observability tool designed to break down complex agentic AI operations into individual actions and components. By visualizing these interactions as a knowledge graph, AgentSeer provides a clear view of how agents, tools, and memory systems interact. This granular visibility was crucial for identifying specific points of vulnerability within the agent’s execution flow, allowing the team to understand how attacks propagate through the system.
Context Matters: Tools and Semantics
A key insight from the research is that vulnerability is highly dependent on the specific context of an agent’s actions. Tool-calling scenarios, for example, were found to be significantly more vulnerable, showing a 24% higher attack success rate than non-tool actions. Certain tools, like agent transfer operations (where control is passed between different AI agents) and code execution capabilities, posed the highest risks. This indicates that security measures need to be tailored to specific agentic components and their interactions, rather than a one-size-fits-all approach.
Interestingly, the study also revealed that the length of the input context (how much information the AI is processing) did not correlate with vulnerability. This suggests that agentic vulnerabilities are more about the meaning and structure of the interactions (semantic nature) rather than simply the amount of data. This finding challenges the idea that longer contexts might inherently degrade an AI’s safety defenses.
Also Read:
- Notion 3.0 AI Agents Vulnerable to Data Exfiltration via Malicious Documents
- Assessing Multimodal AI: Daily Tasks Reveal Gaps in General Intelligence
Implications for AI Safety
The findings of this paper underscore a critical need for a paradigm shift in AI safety evaluation. Traditional model-centric approaches are insufficient for securing agentic AI systems. Developers and researchers must adopt deployment-aware evaluation methodologies that test AI within its complete, operational agentic environment. This includes focusing on the security of tool interactions, inter-agent communications, and memory states. The research also highlights that while social engineering tactics remain potent, the instability of agentic-level attack prompts means that vulnerabilities can be highly context-dependent and transient, complicating both attack detection and systematic assessment.
For more detailed information, you can read the full research paper here.


