TLDR: Terrarium is a new framework designed to study safety, privacy, and security in multi-agent systems (MAS) powered by large language models (LLMs). It repurposes the ‘blackboard’ design for modular and configurable testing, identifying key attack vectors like misalignment, malicious agents, and data poisoning. The framework allows for rapid prototyping and evaluation of defenses, demonstrating vulnerabilities such as 100% success rates for privacy and context overflow attacks, and showing how integrity attacks can be more effective with sustained effort. Terrarium aims to accelerate progress towards trustworthy MAS.
Multi-agent systems (MAS) powered by large language models (LLMs) are becoming increasingly common, automating complex tasks like scheduling meetings that require agents to work together. These systems can handle intricate details, user preferences, and private data, making them incredibly powerful. However, this advanced capability also brings new risks, including agents acting against their intended purpose, attacks from malicious parties, and the potential theft of sensitive user data.
To address these critical concerns, researchers have introduced a new framework called Terrarium. This framework is designed for an in-depth study of safety, privacy, and security within LLM-based multi-agent systems. It reintroduces an older concept from multi-agent systems – the ‘blackboard’ design – to create a flexible and modular testing environment for agent collaboration.
The blackboard design acts as a shared, structured workspace where different agents can post their partial results, ideas, constraints, and goals. Other agents can then observe, refine, or challenge this information. In Terrarium, this concept is repurposed as a communication proxy, enabling fine-grained control and observation of how agents interact. This setup is crucial for understanding how information flows and how it can be manipulated.
Terrarium identifies several key areas where these systems are vulnerable. These include misalignment, where agents deviate from their intended goals; malicious agents that actively try to undermine the system; compromised communication channels that can be intercepted or altered; and data poisoning, where false information is introduced to mislead agents. The framework allows researchers to implement various collaborative scenarios and simulate different types of attacks to see how the system responds.
The core idea behind Terrarium is to provide a tool that allows for rapid prototyping, evaluation, and improvement of defenses and system designs. By doing so, it aims to accelerate the development of trustworthy multi-agent systems. The framework is built around five key abstractions: agents, the environment, blackboards, tools, and the communication protocol. This modularity means that different components can be easily swapped out and configured, allowing for extensive experimentation and analysis of system robustness.
Experiments conducted using Terrarium have shown that LLM-based MAS can effectively solve complex problems requiring sophisticated coordination. More importantly, the framework has proven effective in systematically studying various attack vectors. For instance, in privacy attacks, an adversary was able to extract private information with 100% accuracy, even when the agent was explicitly prompted not to reveal it. Similarly, context overflow attacks, which aim to overwhelm an agent’s memory, also achieved a 100% success rate, highlighting a significant vulnerability in MAS to availability attacks.
While attacks targeting system integrity, such as those involving adversarial agents or communication poisoning, did cause a decrease in overall utility, their immediate impact was found to be relatively weaker. However, the research noted a clear correlation: increasing the number of poisoning attempts led to higher attack efficacy. This suggests that while these systems might be resilient to single, isolated integrity attacks, sustained efforts can still cause significant damage.
Also Read:
- Ensuring Robotic Safety: A Multi-Level Approach for LLM-Powered Agents
- MASC: Equipping Multi-Agent LLM Systems with Real-Time Self-Correction
In conclusion, Terrarium offers a controlled and scalable environment for observing and studying multi-agent interactions. By providing a common platform for analyzing safety, security, and privacy, it helps researchers understand the capabilities and vulnerabilities of these advanced systems. This work is vital for designing and optimizing defenses, ultimately paving the way for more secure and reliable multi-agent systems in real-world applications. You can find more details about the framework and its implementation at the Terrarium research paper.


