spot_img
HomeResearch & DevelopmentVAMOS: A Hierarchical Model for Adaptive and Steerable Robot...

VAMOS: A Hierarchical Model for Adaptive and Steerable Robot Navigation

TLDR: VAMOS is a new hierarchical robot navigation model that separates high-level planning (what to do) from low-level physical capabilities (how to do it). It uses a generalist vision-language model for planning 2D paths and a specialist affordance model to ensure those paths are physically feasible for a specific robot. This design allows robots to navigate diverse environments, adapt to different robot types (like wheeled or legged), and be easily guided by natural language, significantly improving navigation success rates over existing methods.

Robot navigation is a complex challenge, especially when robots need to operate in diverse environments and adapt to their unique physical capabilities. Imagine a wheeled robot trying to climb stairs – it simply can’t. Current robot navigation systems often struggle with this, either being too specialized for one type of robot or failing to generalize across different terrains and tasks. This limitation prevents us from fully leveraging large datasets that contain information from various robots with different locomotion abilities.

A new research paper introduces VAMOS, a hierarchical vision-language-action (VLA) model designed to overcome these hurdles. VAMOS tackles the problem by separating the ‘what to do’ (semantic planning) from the ‘how to do it’ (embodiment grounding). This means a single, high-level planner can learn general navigation strategies, while a specialized component ensures the plan is physically possible for the specific robot.

How VAMOS Works

VAMOS operates with two main components:

1. The High-Level VLM Planner: This is a powerful vision-language model (VLM) that acts as a generalist. It’s trained on a vast amount of real-world data from different robots and environments. Given an image of the surroundings and a text-encoded goal (like ‘navigate to x=…, y=…’), it proposes several possible 2D paths in pixel space. These paths represent potential trajectories the robot could follow.

2. The Low-Level Affordance Model: This is a lightweight, specialist model tailored to a specific robot’s physical capabilities. It’s trained efficiently and safely in simulation by observing how the robot performs on various terrains. Its job is to evaluate the feasibility of the 2D paths proposed by the VLM planner. It assigns an ‘affordance score’ to each path, indicating how traversable it is for that particular robot. For example, a wheeled robot’s affordance model would give a low score to a path involving stairs, while a legged robot’s model might give it a high score.

The key innovation lies in the interface between these two models: the 2D path. This structured yet flexible representation allows the generalist planner to learn from heterogeneous data without being constrained by specific robot actions, while the specialist affordance model can then modulate these plans based on the robot’s unique physical limits. The system then selects the most suitable path with the highest affordance score for execution.

Also Read:

Key Advantages and Results

VAMOS has demonstrated significant improvements in robot navigation:

  • Superior Navigation Performance: In extensive real-world experiments across diverse indoor and complex outdoor environments (hallways, atriums, labs, campuses, forests, ramps), VAMOS achieved a 90% average success rate, outperforming state-of-the-art model-based and end-to-end learning methods. It particularly excelled in challenging scenarios with large geometric obstacles requiring long-term planning.
  • Cross-Embodiment Navigation: The hierarchical design allows the same high-level VLM planner to be used across different robot types, such as the legged Boston Dynamics Spot and the wheeled UW Hound robot. By simply swapping the lightweight, embodiment-specific affordance model, VAMOS enables both robots to navigate effectively, taking into account their unique capabilities (e.g., Hound taking ramps, Spot taking stairs).
  • Natural Language Steerability: VAMOS can be easily guided using natural language preferences. Users can append instructions like ‘stay on the right’ or ‘take the stairs’ to the goal command, and the model will adapt its predicted trajectories accordingly.
  • Enhanced Reliability: The affordance model is crucial for robust navigation, significantly improving single-robot reliability by rejecting physically infeasible plans proposed by the VLM, leading to up to 3 times higher success rates in some cases.
  • Benefits of Data Pooling: Training the high-level VLM planner on a diverse mix of datasets from various robots and environments proved more beneficial than training on single, robot-specific datasets, leading to better overall performance.

This research marks a significant step towards creating general-purpose robot navigation agents that can reason both geometrically and semantically about how to act in the world, making them more adaptable and reliable for open-world deployment. For more details, you can read the full research paper here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -