Enhancing Game NPCs with a Hybrid of Reinforcement Learning and Behavior Trees

TLDR: A research paper from AMD explores combining reinforcement learning (RL) with behavior trees (BTs) to create more sophisticated and reliable non-player characters (NPCs) in video games. Using AMD Schola and inspired by “The Last of Us” AI, the study demonstrates how this hybrid approach allows for dynamic, adaptive behaviors (like Flee, Combat, Hide) while maintaining developer control and consistency. The results show the BT+RL method outperforms pure RL and closely matches pure BTs in performance, offering a practical solution for advanced game AI.

The world of video game artificial intelligence (AI) is constantly evolving, with developers striving to create non-player characters (NPCs) that are both intelligent and engaging. While reinforcement learning (RL) has shown remarkable advancements in research, its practical adoption in commercial video games has been slow. This is often due to challenges like inconsistent behavior, complex training, and the need for significant computational resources.

A recent study by researchers at Advanced Micro Devices (AMD) explores a promising solution: combining reinforcement learning with traditional behavior trees (BTs). This hybrid approach aims to leverage the strengths of both methodologies, offering a path to more sophisticated and reliable NPC behaviors in games. The paper, titled “COMBINING REINFORCEMENT LEARNING AND BEHAVIOR TREES FOR NPCS IN VIDEOGAMES WITH AMD SCHOLA,” details their findings.

Behavior trees provide a structured, hierarchical way to manage NPC actions, leading to predictable outcomes. However, designing complex, multi-task BTs can be cumbersome and may result in repetitive gameplay. On the other hand, RL offers dynamic and adaptive decision-making, allowing agents to learn through trial and error. The challenge with RL often lies in training generally capable models, managing reward shaping, and dealing with high computational demands.

The AMD team, including Tian Liu, Alex Cann, Ian Colbert, and Mehdi Saeedi, tackled these issues by integrating RL models into BTs. They used AMD Schola, an open-source plugin for training RL agents within Unreal Engine, to demonstrate the viability of this approach. Their work was inspired by the sophisticated Human Enemy AI in the commercial video game “The Last of Us,” aiming to replicate specific skills such as Flee, Search, Combat, Hide, and Move.

How the Hybrid System Works

In their setup, a behavior tree acts as the overarching strategic decision-maker, while specific, complex skills within the tree are handled by RL-based models. For instance, when an NPC needs to engage in combat, an RL model takes over to manage aiming, shooting, and movement. This modular design means that developers can still control the overall flow of the NPC’s actions through the BT, but the nuanced execution of individual skills benefits from the adaptability of RL.

The researchers trained individual RL models for each skill, using standard observations and actions relevant to that specific task. For example, the Combat model learned to navigate, aim, and shoot, while the Hide model learned to stay out of the player’s line of sight. This modular training reduces the complexity often associated with training a single, large RL model for all tasks.

Empirical Evaluation and Results

To evaluate their BT+RL hybrid, the team compared it against a pure BT baseline and a curriculum learning RL model in a competitive third-person shooter environment built in Unreal Engine. NPCs competed to reduce each other’s health to zero, with various obstacles and ammunition reload points on the map.

The results showed that the hybrid approach performed significantly better than the curriculum RL model in terms of win rate and average damage dealt, while only performing slightly worse than the pure BT model. This suggests that the hybrid method successfully captures enhanced abilities from RL without sacrificing too much of the consistency offered by BTs. While the pure BT model generally completed episodes faster, the hybrid and curriculum models showed a wider distribution of episode lengths, indicating more varied and potentially engaging gameplay trajectories.

Regarding performance, the pure BT approach had the highest average frames per second (FPS), followed by the curriculum RL and hybrid models. This is expected, as the hybrid model involves computations for both the BT and the RL model. However, the researchers noted that optimizations like batching, which are enabled by the use of small, reusable RL models in the BT+RL approach, could further enhance performance.

Also Read:

Looking Ahead

This research highlights the potential of combining RL and BTs to create reliable and cost-effective deep learning-based agents for commercial video games. The BT+RL approach allows for the development of NPCs with interesting and diverse behaviors without extensive reward shaping or imitation learning. The modular and composable nature of the trained models means individual skills can be reused in different BTs, offering developers greater flexibility and control. This reusability also opens doors for performance optimizations and allows developers to manually adjust agent behavior or tune RL-driven actions for consistency.

The AMD team has open-sourced their environments, models, and implementations in AMD Schola to encourage further community investigation and development in this promising area of game AI.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Game NPCs with a Hybrid of Reinforcement Learning and Behavior Trees

How the Hybrid System Works

Empirical Evaluation and Results

Looking Ahead

Gen AI News and Updates

Deductive AI Secures $7.5 Million Seed Funding to Revolutionize Software Reliability with Intelligent SRE Agents

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

Generative AI Powers Next-Gen Autonomous Emergency Response

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates