TLDR: The GBPP research introduces a two-stage learning method for robots to predict optimal base positions for grasping objects. It combines inexpensive heuristic auto-labeling for broad coverage with targeted high-fidelity simulation for refinement, leading to efficient, accurate, and generalizable base placement that outperforms traditional methods in both simulation and real-world tasks.
Mobile robots face a significant challenge: positioning themselves correctly to successfully grasp objects, especially in complex and cluttered environments. Traditional methods often fall short, either by being too slow or by failing to consider the robot’s arm reach and potential collisions during navigation. This often leads to the robot driving to a spot where it simply cannot perform the intended grasp, requiring costly re-planning.
The research paper, titled “GBPP: Grasp-Aware Base Placement Prediction for Robots via Two-Stage Learning,” by Jizhuo Chen, Diwen Liu, Jiaming Wang, and Harold Soh, introduces an innovative solution to this problem. They propose a two-stage learning approach called Grasp-Aware Base Placement Prediction (GBPP) that helps robots determine the best base position for grasping.
The Core Problem with Current Robot Systems
Many existing robotic systems use a modular approach: perception identifies an object, navigation moves the robot close, and then a grasp planner tries to pick up the object. The issue is that navigation often ignores the arm’s reach or potential grasp constraints, leading to frequent dead-ends. While more advanced methods like Task-and-Motion Planning (TAMP) try to optimize everything together, they are typically too slow for real-time deployment and require highly detailed environmental models.
GBPP: A Two-Stage Learning Solution
GBPP tackles these trade-offs by casting base placement as a binary classification problem: given a potential robot position, can it successfully grasp the target? Training such a model directly with large-scale simulations would be incredibly expensive and time-consuming. Instead, GBPP uses a clever hybrid strategy:
Stage 1: Heuristic Auto-Labeling
The first stage uses a simple, lightweight “distance-visibility” heuristic. This rule quickly evaluates candidate base positions based on how close they are to the target object and how well the robot can see it. This allows for the automatic labeling of vast amounts of data at a negligible cost, providing the model with a broad initial understanding of feasible base placements.
Stage 2: Simulation-Based Refinement
After the initial training with heuristic labels, a smaller, carefully selected set of high-fidelity simulation data is used to refine the model. This stage calibrates the model’s predictions to match actual grasp outcomes, accounting for subtle constraints like joint limits and complex collisions that the simpler heuristic might miss.
Benefits and Performance
This two-stage approach offers significant advantages. It allows the model to quickly score hundreds of candidate base poses in approximately 0.3 seconds, enabling dense, real-time evaluation. The research highlights that the heuristics provide scale and coverage, while the simulation ensures fidelity to true grasp outcomes. Together, they enable practical and data-efficient base placement for mobile manipulators.
In both simulation and real-world evaluations, GBPP consistently outperformed geometric baselines. For instance, in real-world tests using a Stretch 3 mobile manipulator, GBPP achieved a 73% overall success rate across various cluttered scenes (office desk, shelf corner, living-room table), significantly outperforming an open-loop exploration strategy (53-73%) and a simple proximity baseline (27-40%). Even when GBPP made an incorrect prediction, the chosen pose was spatially very close to a feasible alternative, allowing for quick recovery through local re-planning.
Also Read:
- Safe and Efficient Robot Skill Acquisition Through Self-Augmented Trajectories
- Tenma: A New Approach to Versatile Robot Manipulation
Real-World Deployment and Limitations
The system was successfully deployed on a Hello Robot Stretch 3, demonstrating its ability to generalize from simulation to novel robot platforms and diverse real-world environments. The robot could observe a scene, evaluate candidate positions, predict the best base pose, navigate to it, and successfully grasp the target object.
However, the researchers also identified a key limitation: the system’s reliance on the quality of input data from consumer-grade RGB-D cameras. These sensors can produce incomplete or noisy point clouds due to missing depth returns, reflective surfaces, or occlusions. Such imperfections can distort the input geometry and degrade prediction accuracy, potentially causing the model to miss viable base positions. Addressing these perceptual issues through techniques like data augmentation, denoising, or multi-view fusion is a promising area for future work.
In conclusion, GBPP offers a practical and efficient framework for learning geometry-aware base poses in grasping tasks. By combining inexpensive heuristic bootstrapping with targeted simulation refinement, it provides a scalable and sample-efficient path toward robust mobile manipulation. You can read the full research paper here.


