Zero-Shot Navigation: How ZeST Helps Robots Map Unknown Environments

TLDR: ZeST is a novel robotics navigation system that uses Large Language Models (LLMs) to predict terrain traversability in real-time for unknown environments, eliminating the need for dangerous data collection. It segments images, queries an LLM for traversability scores, models uncertainty using a Normal-Inverse-Gamma distribution, and uses risk-aware path planning (RRT*) and control (MPPI) to guide robots safely. Experiments show ZeST outperforms other state-of-the-art methods in both indoor and outdoor settings, demonstrating a robust and safer approach to autonomous navigation.

The world of robotics and autonomous navigation is constantly evolving, with a critical challenge being the ability of robots to accurately assess and navigate diverse terrains. Traditionally, training robots to understand terrain traversability has involved putting them in potentially hazardous environments, risking equipment damage and safety. This labor-intensive process often requires extensive manual labeling and expert annotations, making it costly and time-consuming.

A new approach called ZeST (Zero-Shot Traversability) is changing this paradigm. Developed by researchers, ZeST leverages the visual reasoning capabilities of Large Language Models (LLMs) to create real-time traversability maps without exposing robots to danger. This innovative method not only enables zero-shot traversability – meaning the robot can navigate unknown environments without prior training data for that specific terrain – but also significantly accelerates the development of advanced navigation systems, offering a cost-effective and scalable solution.

How ZeST Works

ZeST operates as a modular navigation system designed for unstructured and unknown environments. Its core objective is to allow autonomous robots to navigate safely and efficiently without needing prior knowledge or extensive data collection. Here’s a breakdown of its key components:

Mask Generation: Before querying an LLM, ZeST pre-processes input images. It uses off-the-shelf models like Segment Anything Model (SAM) or Simple Linear Iterative Clustering (SLIC) to automatically segment images into distinct regions based on visual similarity. These regions are then assigned unique identifiers, creating a numbered version of the image for the LLM.
Querying the Large Language Model: ZeST then queries a multimodal LLM (like GPT-4o) to predict traversability for each segmented region. The prompts provided to the LLM include contextual information about the robot’s characteristics (e.g., size, mobility) and examples of terrain types with their corresponding traversability values. The LLM processes this input and outputs a list of traversability values for each region.
Learning a Traversability Distribution: Recognizing that LLM predictions can vary, ZeST models traversability as a latent probabilistic distribution rather than a single value. It uses a Normal-Inverse-Gamma (NIG) distribution to capture both aleatoric uncertainty (inherent measurement noise) and epistemic uncertainty (uncertainty due to limited data), providing a more robust representation of terrain.
Risk Assessment: To ensure safe navigation, ZeST quantifies risk using the Conditional Value at Risk (CVaR) metric. This involves computing the expected value of traversability given that it falls below a certain threshold, effectively identifying the worst-case scenarios within a given area. This risk information is crucial for making informed navigation decisions.
Traversability-based Path Planning: ZeST employs a sampling-based RRT* (Rapidly-exploring Random Tree Star) algorithm for path planning. Unlike traditional methods that only check for collisions, ZeST’s RRT* incorporates the CVaR cost and epistemic uncertainty into its cost function. This guides the planner to favor routes that are not only short but also safer and easier for the robot to navigate, especially in areas with high uncertainty.
Traversability-based Model Predictive Controller: For real-time control, ZeST uses a Model Predictive Path Integral (MPPI) controller. This controller samples random actions and minimizes a cost function that balances accurate path tracking with maximizing traversability (safety). Importantly, it includes a speed-conditioned epistemic uncertainty cost, prompting the robot to slow down in unknown areas to gather more information and reduce uncertainty.

Real-World Performance

ZeST was implemented on a TerraSentia robot, equipped with a LiDAR and a Jetson AGX for onboard computation, along with a GSM router for online GPT-4o API calls. The system was rigorously tested in both controlled indoor and unstructured outdoor environments, and its performance was compared against state-of-the-art methods like NoMaD and CoNVOI.

The results were compelling: ZeST achieved a 100% success rate in indoor cluttered environments (10 out of 10 runs) and outdoor forest-like environments (5 out of 5 runs). In contrast, NoMaD and CoNVOI struggled, demonstrating that ZeST’s zero-shot approach and robust uncertainty modeling provide superior generalization capabilities in novel settings.

While querying large LLMs can introduce latency (typically 1-2.5 seconds, with occasional spikes up to 5 seconds), ZeST addresses this by generating a 10-meter Octomap and slowing down the robot in unknown areas. This allows the robot to update its map and learn the true distribution of the location, enhancing safety. For mask generation, ZeST opts for SLIC over SAM due to its significantly faster processing time (0.1 seconds vs. 1 second per image) while yielding similar LLM responses.

Also Read:

Conclusion

ZeST represents a significant step forward in autonomous navigation. By integrating multimodal LLMs with probabilistic mapping, it enables robots to create global traversability maps in a zero-shot manner, eliminating the need for dangerous physical interaction during training. The system’s ability to quantify and manage uncertainty, combined with its efficient path planning and control, results in safer and more efficient navigation. This research paves the way for more robust and autonomous robotic systems capable of understanding complex environments. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Zero-Shot Navigation: How ZeST Helps Robots Map Unknown Environments

How ZeST Works

Real-World Performance

Conclusion

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates