TLDR: Researchers have developed a reinforcement learning-based control method for legged robots that significantly improves energy efficiency across multiple gravity environments, from lunar to super-Earth. By using gravity-scaled and power-optimized reward functions, the approach enables robots to perform locomotion and base pose control with reduced power consumption. Real-world tests in simulated lunar gravity showed a 36% power saving compared to baseline controllers, offering a scalable solution for future planetary exploration missions.
Legged robots are emerging as crucial tools for exploring challenging extraterrestrial environments, such as the Moon, Mars, and asteroids. Their ability to navigate unstructured terrain offers a significant advantage over traditional wheeled rovers. However, the harsh conditions of space, particularly limited power and thermal budgets, demand highly energy-efficient control systems that can adapt to varying gravity levels.
A recent research paper, titled “Energy-Efficient Learning-Based Control of a Legged Robot in Multiple Gravity Environments,” by Philip Arm, Oliver Fischer, Joseph Church, Adrian Fuhrer, Hendrik Kolvenbach, and Marco Hutter, introduces a groundbreaking reinforcement learning-based control approach designed to address these challenges. This method utilizes gravity-scaled and power-optimized reward functions to enable legged robots to move and maintain their balance efficiently across a wide range of gravitational forces, from the Moon’s low gravity (1.62 m/s²) to a hypothetical super-Earth’s higher gravity (19.62 m/s²).
The Need for Legged Mobility in Space
Historically, planetary exploration has relied almost exclusively on wheeled rovers. While reliable on relatively flat terrain, these systems struggle on steep slopes, granular soil, and highly unstructured environments. Legged robots, on the other hand, have demonstrated impressive mobility on Earth, capable of traversing complex natural terrains and overcoming significant obstacles. This makes them ideal candidates for exploring difficult-to-access areas like lunar pits, crater walls, and lava tubes, which are critical for scientific discovery and preparing for future human missions.
A Smart Approach to Control
The core of this research lies in its novel control approach, which leverages reinforcement learning. The researchers developed a system where the robot learns to move efficiently by receiving ‘rewards’ for desired behaviors and ‘penalties’ for undesirable ones. A key innovation is the introduction of gravity-scaled reward functions. This means that the importance of different control objectives (like minimizing joint torque or power consumption) is adjusted based on the specific gravitational environment. For instance, a reward function penalizing squared torque is scaled differently than one penalizing joint power, ensuring the control policy remains effective and balanced across varying gravities.
Furthermore, the control policies are specifically optimized for power consumption. The total power loss is modeled to include recuperation loss (energy not effectively converted back to electrical energy during braking) and winding loss (due to resistive heating in the motor windings). By penalizing these losses during the learning process, the robot is trained to adopt gaits and movements that inherently consume less power.
Real-World Validation and Impressive Results
The controllers were implemented on an improved version of the Magnecko quadrupedal robot, a 15.65 kg robot with a leg length of 0.5 meters. Two main tasks were developed: a locomotion controller for tracking planar velocity and a base pose controller for maintaining desired height, pitch, and yaw. These were tested in simulations across various gravity levels and, crucially, validated on the real robot.
To simulate lunar gravity for real-world experiments, the team designed a passive constant-force spring offload system. This system effectively reduces the robot’s apparent weight, allowing it to experience conditions similar to lunar gravity while still on Earth. The legs of the robot were also internally compensated for Earth’s gravity, ensuring the policies only needed to regulate around this compensation.
The results were significant. The power-optimized locomotion controller achieved a power consumption of 23.4 W in Earth gravity at 0.4 m/s, representing a 23% improvement over a baseline policy. In the simulated lunar gravity environment, the power-optimized policy consumed only 12.2 W, which is 36% less than a baseline controller not optimized for power efficiency. These findings demonstrate that gravity-scaled, power-optimized reward functions lead to more efficient and qualitatively superior control policies across multiple gravity levels and tasks.
Also Read:
- Assessing Foundation Models for Planning Assistance
- Improving Online Planning with Robust Sparse Sampling
Looking Ahead
While the control policies showed remarkable energy savings, the researchers noted that the robot’s standby power consumption (from onboard computers, routers, and motor controllers) was still six times higher than the power consumed by the efficient lunar locomotion policy. This highlights the need for optimizing all onboard systems for power efficiency in future lunar missions. The passive offload system proved effective for real-world validation, though future work will focus on refining it to reduce horizontal disturbances and allow for fine-tuning of the offload force.
This research provides a scalable and energy-efficient approach to controlling legged robots, paving the way for more ambitious and sustainable robotic exploration of our solar system. For more details, you can read the full paper here.


