Think-In Games: Empowering Language Models to Master Interactive Strategy

TLDR: The ‘Think-In Games’ (TiG) framework enables large language models (LLMs) to acquire procedural knowledge in interactive game environments, bridging the gap between knowing ‘about’ and knowing ‘how to do’. By reformulating reinforcement learning as a language modeling task, TiG allows LLMs to generate language-guided policies refined by environmental feedback. This approach achieves competitive performance with less data than traditional RL, provides transparent, step-by-step natural language explanations for decisions, and has been successfully demonstrated in MOBA games like Honor of Kings.

Large language models (LLMs) have shown incredible prowess in complex tasks like writing poetry, solving math problems, and generating code. However, they often stumble when faced with simple interactive challenges that even young children master through play. This highlights a significant gap: LLMs excel at declarative knowledge (knowing *about* something) but struggle with procedural knowledge (knowing *how to do* something).

Traditional reinforcement learning (RL) agents, on the other hand, are adept at acquiring procedural knowledge through interaction with environments. Yet, they often operate as ‘black boxes,’ making their decisions difficult to understand, and typically demand vast amounts of training data. LLMs possess extensive world knowledge and reasoning capabilities, but converting this static knowledge into dynamic decision-making in interactive settings has been a persistent challenge.

Introducing Think-In Games (TiG)

To bridge this crucial gap, researchers from Tencent have proposed a novel framework called Think-In Games (TiG). TiG empowers LLMs to develop a deep procedural understanding by directly interacting with game environments, all while retaining their inherent abilities to reason and explain their actions. Essentially, TiG reframes RL-based decision-making as a language modeling task. LLMs generate policies guided by natural language, which are then iteratively refined through online reinforcement learning based on real-time feedback from the game environment.

How TiG Works

The framework focuses on high-level strategic reasoning, particularly within Multiplayer Online Battle Arena (MOBA) games like Honor of Kings. These games provide a rich and challenging environment for testing complex decision-making, team coordination, and long-term planning.

TiG formalizes the game environment by representing game states as structured JSON objects, making it easier for LLMs to process and understand. It defines a finite set of ‘macro-level actions’ – high-level team objectives such as “Push Top Lane,” “Secure Dragon,” or “Defend Base.” This abstraction allows the model to focus on strategic thinking rather than low-level mechanics.

The core of TiG is a policy model, an LLM trained to map game states to these macro-level actions and, crucially, to provide a natural language reasoning chain explaining *why* it chose a particular action. This emphasis on explanation significantly improves transparency and interpretability.

Training and Results

TiG employs a multi-stage training process that combines supervised fine-tuning (SFT) and reinforcement learning (RL) using an algorithm called Group Relative Policy Optimization (GRPO). The training data is collected from anonymized real game matches, with a sophisticated relabeling algorithm ensuring dense and consistent action labels. A simple rule-based reward system is used: a reward of 1 for a correct action prediction and 0 otherwise, encouraging the model to align with expert player behavior.

The experimental results are compelling. The combination of SFT and GRPO leads to substantial improvements in model performance across different sizes. Remarkably, smaller models trained with TiG, such as Qwen-3-14B, achieved an accuracy of 90.91%, outperforming much larger models like Deepseek-R1 (86.67%) which has an order of magnitude more parameters. This demonstrates TiG’s efficiency and scalability. Furthermore, the training method preserves, and in some cases slightly improves, the general language understanding and reasoning abilities of the LLMs.

Understanding the Decisions

One of TiG’s most significant contributions is its ability to provide step-by-step natural language explanations for its decisions. A case study involving the hero A Gu Duo in Honor of Kings vividly illustrates this. The model performs a holistic situation analysis, prioritizes objectives (e.g., destroying a weakened tower), formulates a concrete strategy (e.g., “join Jiang Ziya at the enemy mid-lane tier-one tower and focus fire to bring it down”), and integrates hero-specific playstyle knowledge. This intricate reasoning is then distilled into clear, actionable guidance for the player, such as “Jointly push down the enemy mid-lane tier-one tower with Jiang Ziya; be mindful of a potential enemy ambush.”

Also Read:

Looking Ahead

While TiG shows immense promise in bridging the gap between declarative and procedural knowledge, the researchers acknowledge certain limitations. Its effectiveness is tied to the quality of the underlying LLM, and its generalizability beyond game environments to domains like robotics still needs thorough investigation. Although it improves sample efficiency compared to traditional RL, it still requires substantial environmental interaction.

Despite these, Think-In Games represents a significant step forward in developing AI agents that can not only act effectively in dynamic environments but also explain their reasoning, paving the way for more transparent and interpretable AI systems. You can read the full research paper here: Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Think-In Games: Empowering Language Models to Master Interactive Strategy

Introducing Think-In Games (TiG)

How TiG Works

Training and Results

Understanding the Decisions

Looking Ahead

Gen AI News and Updates

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

FaithAct: A Framework for Verifying AI’s Visual Reasoning Steps

Enhancing Interpretability and Performance in Vision Transformers with Randomized-MLP Regularization

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates