TLDR: This research introduces a hierarchical framework for bimanual robot manipulation, addressing the challenge of coordinating two arms for long-horizon, contact-rich tasks. It uses reinforcement learning to train a library of low-level single-arm and bimanual skills. A Transformer-based high-level policy then acts as a planner and scheduler, simultaneously predicting discrete skill sequences and their continuous parameters for both arms. This approach enables more efficient, coordinated behaviors by supporting both parallel and sequential skill execution, outperforming traditional end-to-end and sequential planning methods in success rates and task completion efficiency.
Robots capable of using two arms to perform complex, contact-rich tasks like humans has long been a significant ambition in robotics. However, programming two robotic arms to work together harmoniously, often requiring a mix of simultaneous and sequential actions, presents a substantial challenge. Traditional approaches often treat robot actions as a purely sequential process, which can be inefficient for bimanual tasks where both arms could be working in parallel.
A new research paper, titled Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills, introduces a novel hierarchical framework that tackles this problem by framing it as an integrated skill planning and scheduling challenge. This approach moves beyond simple sequential decision-making to enable robots to invoke skills simultaneously, leading to more coordinated and efficient behaviors.
A Two-Tiered Approach to Robot Coordination
The core of this new method lies in its hierarchical structure. It breaks down complex tasks into two main levels:
1. Low-Level Primitive Skills: The foundation is a library of basic, fundamental skills for both single-arm and bimanual operations. These skills are trained using Reinforcement Learning (RL) in a simulated environment. Examples include single-arm pushing, single-arm rotating, bimanual rotating, bimanual pushing, and bimanual pick-and-place. These are designed to handle contact-rich interactions and are parameterized, meaning they can adapt to specific task goals like a target object pose.
2. High-Level Planner & Scheduler: On top of these primitive skills sits a sophisticated high-level policy, built using a Transformer architecture. This policy acts as a ‘brain’ that learns from a dataset of successful skill compositions. Its job is to simultaneously predict both the discrete schedule of which skills to use and their continuous parameters (e.g., exactly where to push or rotate an object). This allows it to decide when arms should act independently, when they should collaborate, and in what sequence, optimizing for both parallel and sequential execution.
Overcoming Limitations of Previous Methods
Many existing robotic manipulation strategies struggle with bimanual tasks because they are designed for single-robot scenarios or enforce strictly sequential actions. This new framework addresses these limitations by explicitly supporting simultaneous skill invocation, which is crucial for tasks where two arms can work together or independently to speed up completion.
For instance, in a task involving placing two tabletop objects into a bin, the scheduling policy might first command both arms to simultaneously push their respective objects towards a central position. Following this, it could sequentially invoke bimanual rotating and pick-and-place skills for each object, demonstrating a dynamic combination of parallel and sequential actions.
Also Read:
- Human-Assisted Online Learning for Robust Robotic Manipulation
- Combining Deep Learning for Smarter Robot Navigation
Demonstrated Success and Efficiency
The researchers evaluated their approach in simulated experiments, specifically on a long-horizon, contact-rich task: placing one or two bulky objects into a bin. This task is challenging because the objects are often too large for a single arm and may start out of reach, requiring complex coordination.
The results were compelling. The hierarchical framework achieved significantly higher success rates compared to end-to-end Reinforcement Learning approaches. It also produced more efficient and coordinated behaviors than traditional sequential-only planners, evidenced by a notable reduction in episode duration. This highlights the method’s ability to effectively plan and schedule for both arms simultaneously, leading to faster task completion.
In conclusion, this work presents a significant step forward in enabling robots to perform complex bimanual manipulation tasks with human-like dexterity and efficiency. By integrating skill planning and scheduling within a hierarchical framework, robots can now better coordinate their actions, leveraging both parallel and sequential execution to achieve challenging goals.


