TLDR: The ‘Think-In Games’ (TiG) framework enables large language models (LLMs) to acquire procedural knowledge in interactive game environments, bridging the gap between knowing ‘about’ and knowing ‘how to do’. By reformulating reinforcement learning as a language modeling task, TiG allows LLMs to generate language-guided policies refined by environmental feedback. This approach achieves competitive performance with less data than traditional RL, provides transparent, step-by-step natural language explanations for decisions, and has been successfully demonstrated in MOBA games like Honor of Kings.
Large language models (LLMs) have shown incredible prowess in complex tasks like writing poetry, solving math problems, and generating code. However, they often stumble when faced with simple interactive challenges that even young children master through play. This highlights a significant gap: LLMs excel at declarative knowledge (knowing *about* something) but struggle with procedural knowledge (knowing *how to do* something).
Traditional reinforcement learning (RL) agents, on the other hand, are adept at acquiring procedural knowledge through interaction with environments. Yet, they often operate as ‘black boxes,’ making their decisions difficult to understand, and typically demand vast amounts of training data. LLMs possess extensive world knowledge and reasoning capabilities, but converting this static knowledge into dynamic decision-making in interactive settings has been a persistent challenge.
Introducing Think-In Games (TiG)
To bridge this crucial gap, researchers from Tencent have proposed a novel framework called Think-In Games (TiG). TiG empowers LLMs to develop a deep procedural understanding by directly interacting with game environments, all while retaining their inherent abilities to reason and explain their actions. Essentially, TiG reframes RL-based decision-making as a language modeling task. LLMs generate policies guided by natural language, which are then iteratively refined through online reinforcement learning based on real-time feedback from the game environment.
How TiG Works
The framework focuses on high-level strategic reasoning, particularly within Multiplayer Online Battle Arena (MOBA) games like Honor of Kings. These games provide a rich and challenging environment for testing complex decision-making, team coordination, and long-term planning.
TiG formalizes the game environment by representing game states as structured JSON objects, making it easier for LLMs to process and understand. It defines a finite set of ‘macro-level actions’ – high-level team objectives such as “Push Top Lane,” “Secure Dragon,” or “Defend Base.” This abstraction allows the model to focus on strategic thinking rather than low-level mechanics.
The core of TiG is a policy model, an LLM trained to map game states to these macro-level actions and, crucially, to provide a natural language reasoning chain explaining *why* it chose a particular action. This emphasis on explanation significantly improves transparency and interpretability.
Training and Results
TiG employs a multi-stage training process that combines supervised fine-tuning (SFT) and reinforcement learning (RL) using an algorithm called Group Relative Policy Optimization (GRPO). The training data is collected from anonymized real game matches, with a sophisticated relabeling algorithm ensuring dense and consistent action labels. A simple rule-based reward system is used: a reward of 1 for a correct action prediction and 0 otherwise, encouraging the model to align with expert player behavior.
The experimental results are compelling. The combination of SFT and GRPO leads to substantial improvements in model performance across different sizes. Remarkably, smaller models trained with TiG, such as Qwen-3-14B, achieved an accuracy of 90.91%, outperforming much larger models like Deepseek-R1 (86.67%) which has an order of magnitude more parameters. This demonstrates TiG’s efficiency and scalability. Furthermore, the training method preserves, and in some cases slightly improves, the general language understanding and reasoning abilities of the LLMs.
Understanding the Decisions
One of TiG’s most significant contributions is its ability to provide step-by-step natural language explanations for its decisions. A case study involving the hero A Gu Duo in Honor of Kings vividly illustrates this. The model performs a holistic situation analysis, prioritizes objectives (e.g., destroying a weakened tower), formulates a concrete strategy (e.g., “join Jiang Ziya at the enemy mid-lane tier-one tower and focus fire to bring it down”), and integrates hero-specific playstyle knowledge. This intricate reasoning is then distilled into clear, actionable guidance for the player, such as “Jointly push down the enemy mid-lane tier-one tower with Jiang Ziya; be mindful of a potential enemy ambush.”
Also Read:
- Unlocking AI’s Understanding: Learning Action Models from Incomplete Information
- Navigating Complex Tasks with Tree-Guided Diffusion
Looking Ahead
While TiG shows immense promise in bridging the gap between declarative and procedural knowledge, the researchers acknowledge certain limitations. Its effectiveness is tied to the quality of the underlying LLM, and its generalizability beyond game environments to domains like robotics still needs thorough investigation. Although it improves sample efficiency compared to traditional RL, it still requires substantial environmental interaction.
Despite these, Think-In Games represents a significant step forward in developing AI agents that can not only act effectively in dynamic environments but also explain their reasoning, paving the way for more transparent and interpretable AI systems. You can read the full research paper here: Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models.


