TLDR: A study explored how Large Language Models (LLMs) cooperate in an iterated public goods game when told they are playing against “another AI agent” versus “themselves.” Researchers found that simply telling an LLM it is playing against itself significantly alters its cooperative behavior, sometimes leading to more defection (less cooperation) and other times to more cooperation, depending on its initial programming (e.g., “collective” or “selfish” prompts). This suggests that an LLM’s perceived identity, even if fabricated, can influence its strategic decisions in multi-agent environments.
As artificial intelligence systems become more sophisticated and are deployed in environments where multiple AI agents interact, understanding their social dynamics is crucial. A recent research paper, “The AI in the Mirror: LLM Self-Recognition in an Iterated Public Goods Game”, delves into how Large Language Models (LLMs) behave when they believe they are playing against another AI versus when they believe they are playing against themselves.
The Game of Cooperation
The researchers adapted a classic behavioral economics experiment called the iterated public goods game. In this game, players are given points each round and decide how many to contribute to a common pool. The total contributions are then multiplied and divided among all players. The catch is that while contributing benefits the group, an individual player can maximize their personal gain by contributing less (known as “free-riding”). The game is played over multiple rounds, allowing for strategic behavior to emerge.
The Experiment Setup
The study involved various LLMs, including GPT-4o, Claude Sonnet 4, Llama 4 Maverick, and Qwen3. These models were assigned different “system prompts” to influence their behavior: “collective” (prioritizing the common good), “neutral” (only game rules), or “selfish” (prioritizing personal payoff). The core of the experiment lay in two conditions:
No-Name Condition: LLMs were told they were playing against “another AI agent.”
Name Condition: LLMs were told they were playing against themselves (e.g., GPT-4o was told it was playing against GPT-4o). Importantly, the models were “lied” to; they were actually playing against a separate instance of themselves, not truly interacting with their own internal processes.
The study was conducted in three main parts:
Study 1: Two LLMs played against each other in pairs (e.g., GPT-4o vs. Sonnet 4). Models were asked for their reasoning before making a contribution, and were reminded of their opponent’s identity each round.
Study 2: Similar to Study 1, but with rephrased prompts, no reasoning requested, and no reminders of opponent identity each round. This aimed to see if the initial findings were robust to changes in prompt wording and interaction style.
Study 3: This was a more direct test of “self-play.” Four instances of the *same* LLM (e.g., four Sonnet 4s) played against each other, all given the same system prompt. This explored behavior in a multi-agent setting where models truly believed they were playing against identical copies of themselves.
Key Findings: The AI in the Mirror
Across the studies, a significant pattern emerged: simply telling an LLM that it was playing against itself (the “name” condition) measurably changed its tendency to cooperate. This difference was observed even in the very first round, before any game history could influence decisions, suggesting an initial bias based on perceived identity.
Study 1 revealed a fascinating paradox: When models were prompted to be “collective” (prioritize common good), telling them they were playing against themselves often led to *less* cooperation (more defection). Conversely, when prompted to be “selfish,” the “name” condition often resulted in *more* cooperation. This counter-intuitive result suggests that models might be wary of defection from an identical opponent when aiming for collective good, or perhaps more willing to cooperate with a “selfish” self.
Study 2 largely confirmed these trends, even with rephrased prompts and less explicit reminders. While the differences were sometimes less pronounced, the core finding that perceived identity influences cooperation remained.
Study 3, with four identical LLMs playing together, also showed distinct behaviors. For instance, a “collective” Sonnet 4 contributed more in the “name” condition, while a “selfish” Llama 4 defected earlier when playing against itself.
The researchers noted that LLMs rarely explicitly mentioned playing against themselves in their reasoning traces. This leaves the exact mechanism unclear, but they hypothesize it might stem from the models’ inherent knowledge of their own capabilities, leading them to anticipate similar strategic thinking from an identical opponent.
Also Read:
- Investigating Trust Dynamics Among Large Language Models: Explicit Declarations vs. Implicit Behaviors
- Evaluating LLM Behavior in Dynamic Economic Tasks
Implications for Future AI Systems
These findings have significant implications for the development and deployment of multi-agent AI systems. Depending on the application, simply informing an AI agent about the identity of its collaborators (especially if they are perceived as identical) could inadvertently boost or decrease cooperation. For instance, in supply chain management or other collaborative tasks, an AI’s “self-recognition” could lead to unexpected behaviors, potentially impacting efficiency or fairness. The study highlights the need for further research into how AI agents perceive and interact with each other, particularly as these systems become more autonomous and widespread.


