Beyond the Model: Why Agentic AI Systems Demand New Security Approaches

TLDR: A study on GPT-OSS-20B reveals that agentic AI systems have distinct vulnerabilities compared to standalone models. It introduces “agentic-only” vulnerabilities, showing that model-level safety evaluations are insufficient. The research highlights that vulnerabilities are context-dependent, especially in tool-calling scenarios, and are semantic rather than related to input length. It emphasizes the need for dedicated, deployment-aware security frameworks for agentic AI.

As artificial intelligence systems become more sophisticated, moving beyond simple text generation to complex “agentic” systems that can interact with tools and environments, new security challenges are emerging. A recent study titled “Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B” explores these unique vulnerabilities, highlighting a critical difference between how we assess the safety of standalone AI models versus full-fledged agentic deployments.

The research, conducted by Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, and Philip Treleaven from University College London and Holistic AI, reveals that evaluating an AI model in isolation doesn’t fully capture the risks when that model is part of a dynamic, interactive agentic system. They used an open-source model, GPT-OSS-20B, and an observability framework called AgentSeer to conduct a detailed “red teaming” analysis. Red teaming involves intentionally trying to find flaws and vulnerabilities in a system, much like a security test.

Understanding the Vulnerability Gap

The core finding is that vulnerabilities at the model level often behave differently, or even disappear, when the model is integrated into an agentic loop. Conversely, new types of vulnerabilities, termed “agentic-only” vulnerabilities, emerge exclusively within these agentic contexts. For instance, attacks that failed completely against the standalone GPT-OSS-20B model could successfully compromise it when it was operating as part of an agent. This suggests that the way an AI interacts with its environment, its tools, and its memory significantly changes its security profile.

The study found that iterative attacks, which refine malicious prompts over several attempts, were successful against the standalone GPT-OSS-20B model in about 39.47% of cases. However, when these same attacks were transferred to agentic deployments, their effectiveness varied. Human message injection, where malicious prompts are disguised as user input, proved to be the most effective attack vector in agentic systems, achieving an average success rate of 57%. Attacks injected via tool messages, however, had a lower success rate of 40%.

The Role of AgentSeer

To conduct this in-depth analysis, the researchers introduced AgentSeer, an observability tool designed to break down complex agentic AI operations into individual actions and components. By visualizing these interactions as a knowledge graph, AgentSeer provides a clear view of how agents, tools, and memory systems interact. This granular visibility was crucial for identifying specific points of vulnerability within the agent’s execution flow, allowing the team to understand how attacks propagate through the system.

Context Matters: Tools and Semantics

A key insight from the research is that vulnerability is highly dependent on the specific context of an agent’s actions. Tool-calling scenarios, for example, were found to be significantly more vulnerable, showing a 24% higher attack success rate than non-tool actions. Certain tools, like agent transfer operations (where control is passed between different AI agents) and code execution capabilities, posed the highest risks. This indicates that security measures need to be tailored to specific agentic components and their interactions, rather than a one-size-fits-all approach.

Interestingly, the study also revealed that the length of the input context (how much information the AI is processing) did not correlate with vulnerability. This suggests that agentic vulnerabilities are more about the meaning and structure of the interactions (semantic nature) rather than simply the amount of data. This finding challenges the idea that longer contexts might inherently degrade an AI’s safety defenses.

Also Read:

Implications for AI Safety

The findings of this paper underscore a critical need for a paradigm shift in AI safety evaluation. Traditional model-centric approaches are insufficient for securing agentic AI systems. Developers and researchers must adopt deployment-aware evaluation methodologies that test AI within its complete, operational agentic environment. This includes focusing on the security of tool interactions, inter-agent communications, and memory states. The research also highlights that while social engineering tactics remain potent, the instability of agentic-level attack prompts means that vulnerabilities can be highly context-dependent and transient, complicating both attack detection and systematic assessment.

For more detailed information, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Beyond the Model: Why Agentic AI Systems Demand New Security Approaches

Understanding the Vulnerability Gap

The Role of AgentSeer

Context Matters: Tools and Semantics

Implications for AI Safety

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates