TLDR: Researchers from Google DeepMind developed Code World Models (CWMs), an approach where Large Language Models (LLMs) translate natural language game rules and gameplay examples into executable Python code. This code acts as a game simulator, enabling traditional planning algorithms like Monte Carlo Tree Search to play games with strategic depth and avoid illegal moves. The method also generates heuristic value and inference functions, allowing CWMs to excel in both perfect and imperfect information games, outperforming direct LLM policies in most cases, even for novel games.
Large Language Models (LLMs) have shown remarkable abilities in various reasoning tasks, and their application in playing classical board and card games has been a growing area of interest. However, the common approach of directly prompting LLMs to generate moves often leads to issues like illegal actions and a lack of deep strategic foresight. This method relies heavily on the model’s implicit pattern-matching, which can be fragile.
A new research paper from Google DeepMind introduces an innovative solution called Code World Models (CWMs). Instead of asking the LLM to play the game directly, the LLM is tasked with a more fundamental role: translating natural language game rules and observed gameplay into formal, executable Python code. This generated code then serves as a robust simulation engine for powerful planning algorithms, such as Monte Carlo Tree Search (MCTS).
How Code World Models Work
At its core, a CWM is a set of Python functions that define the game’s mechanics. These functions include logic for how the game state changes after an action (state transition), how to list all possible legal moves at any given moment, and how to check if the game has ended. This executable model provides a verifiable and precise specification of the game’s rules.
Beyond just the core rules, the LLM is also prompted to generate additional components that enhance gameplay: heuristic value functions, which help MCTS make more efficient decisions by estimating the value of game states, and inference functions, crucial for estimating hidden states in games where players don’t have complete information (imperfect information games, like Poker).
Key Advantages of This Approach
The CWM method offers three significant benefits over using LLMs as direct game policies:
- Verifiability: The generated code acts as a formal rulebook. This allows planning algorithms to always enumerate valid actions, ensuring no illegal moves are made, provided the synthesized model is correct.
- Strategic Depth: By combining the LLM’s understanding of game semantics with the deep search capabilities of classical planners like MCTS, the system can achieve much greater strategic depth in its gameplay.
- Generalization: The LLM focuses on the meta-task of translating data into code. This makes the system more adaptable and capable of learning and playing new, unfamiliar games more easily, even those not part of its initial training data.
Evaluation and Performance
The researchers evaluated their CWM-based agent on 10 different games, including 4 novel games specifically created for this study to test generalization. The set included both fully observed (perfect information) games like Tic-tac-toe and Connect Four, and partially observed (imperfect information) games such as Leduc Poker and Gin Rummy.
The results were compelling: the CWM agent either outperformed or matched the performance of Gemini 2.5 Pro (when used as a direct policy) in 9 out of the 10 games. This demonstrates the effectiveness of offloading the rule-learning to the LLM and leveraging traditional planning for decision-making.
The paper also explores two scenarios for learning in imperfect information games: “open deck” where hidden states are available during training, and the more challenging “closed deck” where only observations and actions are accessible. Even in the closed deck setting, the CWM approach showed strong performance, a novel contribution to the field.
Also Read:
- CWM: Meta’s New AI Model Learns to Understand Code by Simulating Execution
- Bridging Vision and Formal Logic for Autonomous AI Planning
Challenges and Future Directions
While highly successful, the method did encounter difficulties with games of high logical and procedural complexity, such as Gin Rummy. This highlights an area for future improvement, particularly in mastering games with intricate multi-step subroutines.
This work significantly advances the field of general game-playing AI by demonstrating a powerful way to combine the strengths of large language models with classical AI planning techniques. For more technical details, you can read the full research paper: CODE WORLD MODELS FOR GENERAL GAMEPLAYING.


