Unmasking Vulnerabilities: A New Benchmark for Multi-Agent LLM System Security

TLDR: A new benchmark called TAMAS (Threats and Attacks in Multi-Agent Systems) has been introduced to evaluate the safety and robustness of multi-agent LLM systems. Unlike previous benchmarks focused on single agents, TAMAS explores emergent vulnerabilities from inter-agent dynamics across five high-impact domains and six attack types. The findings reveal that multi-agent LLM systems are highly vulnerable to adversarial attacks, particularly prompt-based ones, and introduces the Effective Robustness Score (ERS) to measure the trade-off between safety and task effectiveness. The research emphasizes the critical need for stronger defenses in collaborative AI systems.

Large Language Models (LLMs) are becoming increasingly sophisticated, acting as autonomous agents capable of complex tasks like planning and decision-making. As these models tackle more intricate problems, multi-agent LLM systems, where multiple LLM agents collaborate, are gaining prominence. However, a critical area that has remained largely unexplored is the safety and security of these multi-agent systems.

Traditional benchmarks for LLM safety primarily focus on single-agent scenarios, failing to capture the unique vulnerabilities that arise from the dynamic interactions and coordination within multi-agent setups. To bridge this significant gap, researchers have introduced a new benchmark called TAMAS, which stands for Threats and Attacks in Multi-Agent Systems. This benchmark is specifically designed to evaluate how robust and safe multi-agent LLM systems are when faced with various threats.

TAMAS comprises five distinct real-world scenarios, including domains like education, legal, finance, healthcare, and news. Within these scenarios, it features 300 adversarial instances across six different attack types and utilizes 211 tools. Additionally, it includes 100 harmless tasks to assess the system’s normal functionality. The benchmark evaluates system performance across ten different backbone LLMs and three agent interaction configurations from popular frameworks like Autogen and CrewAI. This comprehensive evaluation helps to pinpoint critical challenges and common failure points in current multi-agent deployments.

A key contribution of TAMAS is the introduction of the Effective Robustness Score (ERS). This metric helps to assess the delicate balance between a system’s safety under attack and its effectiveness in completing tasks. The findings from the TAMAS benchmark are quite revealing: multi-agent systems are highly susceptible to adversarial attacks. This underscores an urgent need for the development of stronger defense mechanisms to ensure their safe and reliable operation.

The research highlights six main attack types: Direct Prompt Injection (DPI), Indirect Prompt Injection (IPI), Impersonation, Byzantine Agent, Colluding Agents, and Contradicting Agents. Prompt-based attacks, such as Impersonation and DPI, were found to be particularly effective, often succeeding because agents prioritize instructions from perceived authorities. Interestingly, closed-source models like Gemini-2.0-Flash and GPT-4o showed more resilience to Indirect Prompt Injection compared to open-source models.

The study also examined different agent interaction configurations: Central Orchestrator, Sequential, and Collaborative. It was observed that CrewAI configurations generally offered higher safety scores than their AutoGen counterparts. While orchestrator-based setups can be effective, they can also introduce a single point of failure if compromised. Overall, the ERS metric proved valuable in comparing systems, with GPT models consistently achieving high ERS values, indicating a good balance of safety and performance.

Also Read:

The implications of this research are significant. It reveals that multi-agent LLM systems not only inherit vulnerabilities from individual agents but also develop new, emergent risks unique to their collaborative nature. Mitigating these threats will require multi-layered defenses at the agent, orchestration, and backbone model levels to ensure these systems can be safely deployed in real-world applications. For more details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unmasking Vulnerabilities: A New Benchmark for Multi-Agent LLM System Security

Gen AI News and Updates

Anthropic Reveals First AI-Orchestrated Cyber Espionage Campaign by Chinese State-Sponsored Group

Google Bolsters AI Agent Safeguards with Enhanced Safety Frameworks

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates