TLDR: Real-DRL is a new framework for safety-critical autonomous systems that enables deep reinforcement learning (DRL) agents to learn safe and high-performance action policies directly in real physical environments. It addresses challenges like ‘unknown unknowns’ and the ‘Sim2Real gap’ through three interactive components: a DRL-Student for dual self-learning and teaching-to-learn, a PHY-Teacher for physics-model-based safety assurance and teaching, and a Trigger to manage their interaction. The framework ensures assured safety, automatic hierarchy learning (safety-first then performance), and uses safety-informed batch sampling to handle rare safety-critical scenarios.
Deep reinforcement learning (DRL) has shown incredible potential in autonomous systems, from self-driving cars to advanced robotics. However, a major hurdle remains: guaranteeing safety in real-world applications. Traditional DRL often struggles with unpredictable situations, known as ‘unknown unknowns,’ and the ‘Sim2Real gap,’ which is the performance drop when a system trained in a simulator is deployed in the real world. These challenges can lead to critical safety incidents.
A new framework called Real-DRL aims to tackle these issues head-on. Introduced in the paper Real-DRL: Teach and Learn in Reality, this system is designed for safety-critical autonomous systems, allowing a DRL agent to learn and develop safe, high-performance action policies directly in real physical environments, all while prioritizing safety above all else.
How Real-DRL Works: Three Interactive Components
The Real-DRL framework is built around three key interactive components:
-
DRL-Student: This is the core DRL agent that learns. It employs a unique dual learning approach: it learns from its own experiences (self-learning) and also from a ‘teacher’ (teaching-to-learn). Crucially, it uses a ‘safety-informed batch sampling’ method to ensure it learns effectively from rare but critical safety-related situations, known as ‘corner cases,’ which often cause experience imbalances in learning.
-
PHY-Teacher: This component is a physics-model-based design focused purely on safety. Its main roles are to guide the DRL-Student in learning safe actions and to act as a safety backup for the real physical system. The PHY-Teacher is innovative in its ability to adapt in real-time to unknown unknowns and the Sim2Real gap, ensuring the system remains safe even in unforeseen circumstances.
-
Trigger: This component acts as the manager, monitoring the real-time safety status of the physical system. It decides when the DRL-Student is in control and when the PHY-Teacher needs to step in to ensure safety or to teach the student about safe operations. If the system approaches a safety boundary, the Trigger activates the PHY-Teacher.
Key Features and Benefits
Powered by these interactive components, Real-DRL offers several notable features:
-
Assured Safety: It directly addresses the challenges of unknown unknowns and the Sim2Real gap, providing a strong guarantee of safety.
-
Automatic Hierarchy Learning: The system naturally learns in a hierarchical manner, prioritizing safety first, and then focusing on achieving high performance.
-
Safety-Informed Batch Sampling: This mechanism helps the DRL-Student learn more effectively from critical, rare safety scenarios, preventing an imbalance in its learning experience.
Also Read:
- Optimizing Complex Cyber-Physical Systems with Logic-Informed Reinforcement Learning
- BOOM: A New AI Framework for Stable and High-Performance Reinforcement Learning
Real-World Validation
The effectiveness and unique features of Real-DRL have been demonstrated through extensive experiments. These include tests on a real quadruped robot in an indoor environment, a quadruped robot in a simulated wild environment (NVIDIA Isaac Gym), and a cart-pole system for detailed studies. The experiments showed Real-DRL’s ability to maintain safety even when faced with various unknown disturbances like sudden payloads, kicks, and denial-of-service faults, outperforming existing safe DRL and fault-tolerant DRL frameworks.
In essence, Real-DRL provides a robust and intelligent solution for deploying DRL agents in real-world safety-critical applications, ensuring that autonomous systems can learn and operate effectively without compromising safety.


