Code-Based AI Models Master Diverse Games

TLDR: Researchers from Google DeepMind developed Code World Models (CWMs), an approach where Large Language Models (LLMs) translate natural language game rules and gameplay examples into executable Python code. This code acts as a game simulator, enabling traditional planning algorithms like Monte Carlo Tree Search to play games with strategic depth and avoid illegal moves. The method also generates heuristic value and inference functions, allowing CWMs to excel in both perfect and imperfect information games, outperforming direct LLM policies in most cases, even for novel games.

Large Language Models (LLMs) have shown remarkable abilities in various reasoning tasks, and their application in playing classical board and card games has been a growing area of interest. However, the common approach of directly prompting LLMs to generate moves often leads to issues like illegal actions and a lack of deep strategic foresight. This method relies heavily on the model’s implicit pattern-matching, which can be fragile.

A new research paper from Google DeepMind introduces an innovative solution called Code World Models (CWMs). Instead of asking the LLM to play the game directly, the LLM is tasked with a more fundamental role: translating natural language game rules and observed gameplay into formal, executable Python code. This generated code then serves as a robust simulation engine for powerful planning algorithms, such as Monte Carlo Tree Search (MCTS).

How Code World Models Work

At its core, a CWM is a set of Python functions that define the game’s mechanics. These functions include logic for how the game state changes after an action (state transition), how to list all possible legal moves at any given moment, and how to check if the game has ended. This executable model provides a verifiable and precise specification of the game’s rules.

Beyond just the core rules, the LLM is also prompted to generate additional components that enhance gameplay: heuristic value functions, which help MCTS make more efficient decisions by estimating the value of game states, and inference functions, crucial for estimating hidden states in games where players don’t have complete information (imperfect information games, like Poker).

Key Advantages of This Approach

The CWM method offers three significant benefits over using LLMs as direct game policies:

Verifiability: The generated code acts as a formal rulebook. This allows planning algorithms to always enumerate valid actions, ensuring no illegal moves are made, provided the synthesized model is correct.
Strategic Depth: By combining the LLM’s understanding of game semantics with the deep search capabilities of classical planners like MCTS, the system can achieve much greater strategic depth in its gameplay.
Generalization: The LLM focuses on the meta-task of translating data into code. This makes the system more adaptable and capable of learning and playing new, unfamiliar games more easily, even those not part of its initial training data.

Evaluation and Performance

The researchers evaluated their CWM-based agent on 10 different games, including 4 novel games specifically created for this study to test generalization. The set included both fully observed (perfect information) games like Tic-tac-toe and Connect Four, and partially observed (imperfect information) games such as Leduc Poker and Gin Rummy.

The results were compelling: the CWM agent either outperformed or matched the performance of Gemini 2.5 Pro (when used as a direct policy) in 9 out of the 10 games. This demonstrates the effectiveness of offloading the rule-learning to the LLM and leveraging traditional planning for decision-making.

The paper also explores two scenarios for learning in imperfect information games: “open deck” where hidden states are available during training, and the more challenging “closed deck” where only observations and actions are accessible. Even in the closed deck setting, the CWM approach showed strong performance, a novel contribution to the field.

Also Read:

Challenges and Future Directions

While highly successful, the method did encounter difficulties with games of high logical and procedural complexity, such as Gin Rummy. This highlights an area for future improvement, particularly in mastering games with intricate multi-step subroutines.

This work significantly advances the field of general game-playing AI by demonstrating a powerful way to combine the strengths of large language models with classical AI planning techniques. For more technical details, you can read the full research paper: CODE WORLD MODELS FOR GENERAL GAMEPLAYING.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Code-Based AI Models Master Diverse Games

How Code World Models Work

Key Advantages of This Approach

Evaluation and Performance

Challenges and Future Directions

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates