TLDR: A new framework called “Fly, Fail, Fix” uses a reinforcement learning (RL) agent to playtest games and a large multimodal model (LMM) to analyze the gameplay and iteratively adjust game settings to achieve specific design goals. Tested successfully on Flappy Bird, the system shows LMMs can effectively refine game mechanics using either text or visual feedback from RL agents, paving the way for scalable AI-assisted game design.
Game design is a complex dance between creating rules and content, and then seeing how players actually interact with them. It’s tough for modern generative AI systems, which often only look at code or assets, to truly grasp how a game feels when played. This is where a new framework called “Fly, Fail, Fix” comes in, aiming to bridge that gap.
This innovative system combines two powerful AI technologies: a reinforcement learning (RL) agent and a large multimodal model (LMM). Think of the RL agent as an automated playtester, playing the game repeatedly. As it plays, it gathers crucial information, either in the form of numerical scores and timings, or as short video summaries of its gameplay.
The LMM, acting as the game designer, then takes this feedback. It’s given a specific goal for the game, like achieving a certain player score. It analyzes the play data from the RL agent and then makes adjustments to the game’s settings. This iterative loop—play, analyze, revise—helps steer the game’s future behavior closer to the desired goal.
To test this approach, the researchers applied it to the classic game Flappy Bird. Their goal was to fix broken level generators so that the RL agent could achieve a target score of 10. They explored different ways of providing feedback to the LMM: some trials used only text summaries of gameplay metrics, others used only visual summaries from video recordings, and some used both.
The results were quite promising. The LMMs demonstrated a strong ability to understand the behavioral traces provided by the RL agents and iteratively refine game mechanics. Whether they received text-based metrics, gameplay visuals, or both, the LMMs were equally successful at tuning the game’s difficulty to reach the target score. This highlights the potential for current LMMs to reason about visual representations of gameplay, even in cases where the score is easily understood from visual progress.
This research suggests that RL agents can serve as valuable playtesters, providing the necessary feedback for LMMs to automatically refine game designs. This opens up exciting possibilities for AI-assisted game design, making the process more efficient and scalable. The paper, “Fly, Fail, Fix: Iterative Game Repair with Reinforcement Learning and Large Multimodal Models,” can be found here for more details: Research Paper.
Also Read:
- How Perception, Memory, and Reasoning Modules Enhance AI in Games
- A New Framework for Evaluating AI’s Adaptive Intelligence Through Novel Games
Looking ahead, the researchers envision further developments, such as making the RL agents more robust to changes in game physics, using a diverse group of RL agents to better mimic human players, and even allowing the LMM designer to modify the game’s code itself, leading to entirely new mechanics.


