VAMOS: A Hierarchical Model for Adaptive and Steerable Robot Navigation

TLDR: VAMOS is a new hierarchical robot navigation model that separates high-level planning (what to do) from low-level physical capabilities (how to do it). It uses a generalist vision-language model for planning 2D paths and a specialist affordance model to ensure those paths are physically feasible for a specific robot. This design allows robots to navigate diverse environments, adapt to different robot types (like wheeled or legged), and be easily guided by natural language, significantly improving navigation success rates over existing methods.

Robot navigation is a complex challenge, especially when robots need to operate in diverse environments and adapt to their unique physical capabilities. Imagine a wheeled robot trying to climb stairs – it simply can’t. Current robot navigation systems often struggle with this, either being too specialized for one type of robot or failing to generalize across different terrains and tasks. This limitation prevents us from fully leveraging large datasets that contain information from various robots with different locomotion abilities.

A new research paper introduces VAMOS, a hierarchical vision-language-action (VLA) model designed to overcome these hurdles. VAMOS tackles the problem by separating the ‘what to do’ (semantic planning) from the ‘how to do it’ (embodiment grounding). This means a single, high-level planner can learn general navigation strategies, while a specialized component ensures the plan is physically possible for the specific robot.

How VAMOS Works

VAMOS operates with two main components:

1. The High-Level VLM Planner: This is a powerful vision-language model (VLM) that acts as a generalist. It’s trained on a vast amount of real-world data from different robots and environments. Given an image of the surroundings and a text-encoded goal (like ‘navigate to x=…, y=…’), it proposes several possible 2D paths in pixel space. These paths represent potential trajectories the robot could follow.

2. The Low-Level Affordance Model: This is a lightweight, specialist model tailored to a specific robot’s physical capabilities. It’s trained efficiently and safely in simulation by observing how the robot performs on various terrains. Its job is to evaluate the feasibility of the 2D paths proposed by the VLM planner. It assigns an ‘affordance score’ to each path, indicating how traversable it is for that particular robot. For example, a wheeled robot’s affordance model would give a low score to a path involving stairs, while a legged robot’s model might give it a high score.

The key innovation lies in the interface between these two models: the 2D path. This structured yet flexible representation allows the generalist planner to learn from heterogeneous data without being constrained by specific robot actions, while the specialist affordance model can then modulate these plans based on the robot’s unique physical limits. The system then selects the most suitable path with the highest affordance score for execution.

Also Read:

Key Advantages and Results

VAMOS has demonstrated significant improvements in robot navigation:

Superior Navigation Performance: In extensive real-world experiments across diverse indoor and complex outdoor environments (hallways, atriums, labs, campuses, forests, ramps), VAMOS achieved a 90% average success rate, outperforming state-of-the-art model-based and end-to-end learning methods. It particularly excelled in challenging scenarios with large geometric obstacles requiring long-term planning.
Cross-Embodiment Navigation: The hierarchical design allows the same high-level VLM planner to be used across different robot types, such as the legged Boston Dynamics Spot and the wheeled UW Hound robot. By simply swapping the lightweight, embodiment-specific affordance model, VAMOS enables both robots to navigate effectively, taking into account their unique capabilities (e.g., Hound taking ramps, Spot taking stairs).
Natural Language Steerability: VAMOS can be easily guided using natural language preferences. Users can append instructions like ‘stay on the right’ or ‘take the stairs’ to the goal command, and the model will adapt its predicted trajectories accordingly.
Enhanced Reliability: The affordance model is crucial for robust navigation, significantly improving single-robot reliability by rejecting physically infeasible plans proposed by the VLM, leading to up to 3 times higher success rates in some cases.
Benefits of Data Pooling: Training the high-level VLM planner on a diverse mix of datasets from various robots and environments proved more beneficial than training on single, robot-specific datasets, leading to better overall performance.

This research marks a significant step towards creating general-purpose robot navigation agents that can reason both geometrically and semantically about how to act in the world, making them more adaptable and reliable for open-world deployment. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

VAMOS: A Hierarchical Model for Adaptive and Steerable Robot Navigation

How VAMOS Works

Key Advantages and Results

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates