TLDR: Foundation-Model Self-Play (FMSP) is a new AI research direction that combines multi-agent self-play with large foundation models to overcome limitations of traditional self-play, such as lack of diversity and getting stuck in local optima. Variants like Vanilla FMSP (vFMSP), Novelty-Search Self-Play (NSSP), and Quality-Diversity Self-Play (QDSP) were introduced. QDSP, the most promising, creates diverse, high-quality strategies without human-defined dimensions. Tested in ‘Car Tag’ (pursuer-evader game) and ‘Gandalf’ (LLM red-teaming), FMSPs discovered advanced strategies, outperforming human designs in Car Tag and successfully jailbreaking and patching LLM defenses in Gandalf. This work opens new avenues for creative and open-ended AI strategy discovery.
A new research direction called Foundation-Model Self-Play (FMSP) is emerging, aiming to enhance how artificial intelligence (AI) systems learn and innovate. Traditional self-play (SP) methods, where AI agents learn by competing against each other, have been successful in creating expert players in games like Go and StarCraft II. However, these methods often struggle to produce diverse strategies and can get stuck in repetitive, less optimal behaviors.
FMSP addresses these limitations by integrating the advanced capabilities of large foundation models (FMs), which are powerful AI models trained on vast amounts of internet data, including code. These FMs can generate code and possess extensive knowledge, allowing FMSP to overcome the challenges of traditional self-play.
The researchers propose three main variants of FMSP:
Vanilla Foundation-Model Self-Play (vFMSP)
This is the simplest approach, where the system continuously refines an agent’s strategy through competitive self-play. It’s like a single-minded pursuit of winning, constantly improving one strategy against an opponent.
Novelty-Search Self-Play (NSSP)
In contrast to vFMSP, NSSP focuses on generating a wide variety of solutions, even if they don’t immediately lead to the best performance. It prioritizes exploration and diversity, leveraging the FM’s ability to understand and generate novel ideas.
Also Read:
- Unpacking How AI Designs Algorithms: A Deep Dive into LLM-Generated Optimizers
- New AI Agent Masters Generals.io, Reaching Top 0.003% on Human Leaderboard
Quality-Diversity Self-Play (QDSP)
Considered the most promising variant, QDSP combines the strengths of both vFMSP and NSSP. It aims to create a diverse collection of high-quality strategies. This approach is unique because it doesn’t require humans to pre-define what ‘diverse’ means; the foundation model itself helps identify and explore new dimensions of strategy.
The FMSP methods were tested in two distinct environments. The first was ‘Car Tag,’ a continuous-control game where one agent pursues another. In this setting, FMSPs explored a wide range of strategies, including those based on reinforcement learning, tree search, and heuristic methods. QDSP and vFMSP, in particular, developed strategies that outperformed strong human-designed approaches.
The second test was ‘Gandalf,’ an AI safety simulation where an attacker AI tries to ‘jailbreak’ a large language model (LLM) by bypassing its defenses to extract a secret password. FMSPs successfully red-teamed the LLM, breaking through multiple levels of defense. Furthermore, the system could then automatically generate patches to fix the vulnerabilities it discovered, demonstrating a closed-loop system for identifying and mitigating AI safety risks.
This research highlights FMSP as a significant step forward in open-ended strategy discovery and multi-agent innovation. By leveraging foundation models, FMSP can make large, intelligent leaps in strategy space, helping AI systems escape local optima and discover more creative and effective solutions. For more in-depth details, you can read the full research paper here.


