TLDR: A new benchmark called TAMAS (Threats and Attacks in Multi-Agent Systems) has been introduced to evaluate the safety and robustness of multi-agent LLM systems. Unlike previous benchmarks focused on single agents, TAMAS explores emergent vulnerabilities from inter-agent dynamics across five high-impact domains and six attack types. The findings reveal that multi-agent LLM systems are highly vulnerable to adversarial attacks, particularly prompt-based ones, and introduces the Effective Robustness Score (ERS) to measure the trade-off between safety and task effectiveness. The research emphasizes the critical need for stronger defenses in collaborative AI systems.
Large Language Models (LLMs) are becoming increasingly sophisticated, acting as autonomous agents capable of complex tasks like planning and decision-making. As these models tackle more intricate problems, multi-agent LLM systems, where multiple LLM agents collaborate, are gaining prominence. However, a critical area that has remained largely unexplored is the safety and security of these multi-agent systems.
Traditional benchmarks for LLM safety primarily focus on single-agent scenarios, failing to capture the unique vulnerabilities that arise from the dynamic interactions and coordination within multi-agent setups. To bridge this significant gap, researchers have introduced a new benchmark called TAMAS, which stands for Threats and Attacks in Multi-Agent Systems. This benchmark is specifically designed to evaluate how robust and safe multi-agent LLM systems are when faced with various threats.
TAMAS comprises five distinct real-world scenarios, including domains like education, legal, finance, healthcare, and news. Within these scenarios, it features 300 adversarial instances across six different attack types and utilizes 211 tools. Additionally, it includes 100 harmless tasks to assess the system’s normal functionality. The benchmark evaluates system performance across ten different backbone LLMs and three agent interaction configurations from popular frameworks like Autogen and CrewAI. This comprehensive evaluation helps to pinpoint critical challenges and common failure points in current multi-agent deployments.
A key contribution of TAMAS is the introduction of the Effective Robustness Score (ERS). This metric helps to assess the delicate balance between a system’s safety under attack and its effectiveness in completing tasks. The findings from the TAMAS benchmark are quite revealing: multi-agent systems are highly susceptible to adversarial attacks. This underscores an urgent need for the development of stronger defense mechanisms to ensure their safe and reliable operation.
The research highlights six main attack types: Direct Prompt Injection (DPI), Indirect Prompt Injection (IPI), Impersonation, Byzantine Agent, Colluding Agents, and Contradicting Agents. Prompt-based attacks, such as Impersonation and DPI, were found to be particularly effective, often succeeding because agents prioritize instructions from perceived authorities. Interestingly, closed-source models like Gemini-2.0-Flash and GPT-4o showed more resilience to Indirect Prompt Injection compared to open-source models.
The study also examined different agent interaction configurations: Central Orchestrator, Sequential, and Collaborative. It was observed that CrewAI configurations generally offered higher safety scores than their AutoGen counterparts. While orchestrator-based setups can be effective, they can also introduce a single point of failure if compromised. Overall, the ERS metric proved valuable in comparing systems, with GPT models consistently achieving high ERS values, indicating a good balance of safety and performance.
Also Read:
- Unifying Software Engineering Evaluation for AI Coding Agents with SWE-Compass
- LLM Agents Enhance Predictive Maintenance by Cleaning Noisy Logs
The implications of this research are significant. It reveals that multi-agent LLM systems not only inherit vulnerabilities from individual agents but also develop new, emergent risks unique to their collaborative nature. Mitigating these threats will require multi-layered defenses at the agent, orchestration, and backbone model levels to ensure these systems can be safely deployed in real-world applications. For more details, you can refer to the full research paper here.


