TLDR: Researchers from UCLA and the University of Washington have developed URBAN-SIM, a high-performance, scalable urban simulation platform designed to accelerate the development of autonomous micromobility. This innovation addresses the limitations of existing simulation tools by offering rich, diverse urban scenes and high-efficiency training capabilities for AI agents, aiming to enhance the safety and efficiency of micromobility devices in complex city environments.
Micromobility solutions, such as delivery robots, mobility scooters, and electric wheelchairs, are rapidly gaining traction as flexible and eco-friendly alternatives for short-distance urban travel. Despite their growing popularity, most micromobility devices still rely heavily on human control, which limits operational efficiency and raises significant safety concerns, particularly in crowded and dynamic urban settings filled with pedestrians and cyclists. Traditional transportation methods often fall short in addressing “last-mile connectivity,” a gap that micromobility is perfectly positioned to fill. However, achieving true autonomy in this sector has been challenging, as current AI solutions typically focus on narrow tasks like obstacle avoidance, failing to account for the multifaceted complexities of real urban environments, including uneven terrain, stairs, and dense crowds.
Addressing these critical limitations, researchers from the University of California, Los Angeles (UCLA), and the University of Washington have introduced URBAN-SIM. This scalable, high-fidelity urban simulation platform is specifically engineered for autonomous micromobility. URBAN-SIM aims to overcome the shortcomings of existing robot learning and simulation platforms, which are often tailored for indoor environments or vehicle-centric road networks, lacking the contextual richness and complexity of urban sidewalks, plazas, and alleys.
URBAN-SIM is a high-performance robot learning platform built upon NVIDIA’s Omniverse and PhysX 5, enabling realistic scene rendering and physics simulation. It boasts the ability to automatically construct an infinite number of diverse and realistic interactive urban scenes for large-scale robot learning. A key feature is its exceptional performance, achieving up to 2,600 frames per second (fps) on a single Nvidia L40S GPU, which significantly supports high-efficiency reinforcement learning (RL) training.
This innovative platform incorporates three critical modules to enhance the diversity, realism, and efficiency of robot learning in simulation: a Hierarchical Urban Generation pipeline, an Interactive Dynamics Generation strategy, and an Asynchronous Scene Sampling scheme. The Hierarchical Urban Generation pipeline can construct an infinite number of static urban scenes, from street blocks to terrain generation. The Interactive Dynamics Generation facilitates GPU-based generation of realistic agent-scene and agent-agent interactions on the fly. The Asynchronous Scene Sampling scheme enables high-efficiency training on varied scenes with rich contextual information.
In conjunction with URBAN-SIM, the researchers also propose URBAN-BENCH, a comprehensive suite of essential tasks and benchmarks designed to evaluate the capabilities of AI agents in achieving autonomous micromobility. URBAN-BENCH includes eight tasks based on three core skills: Urban Locomotion, Urban Navigation, and Urban Traverse. Experiments have been conducted on diverse terrains and urban structures, evaluating four robots with heterogeneous embodiments, including wheeled and legged robots, to reveal their strengths and limitations.
Also Read:
- NVIDIA AI Unveils GraspGen: Advancing Robotic Grasping with Diffusion Models
- Edge AI Powers Conversational Control for Next-Generation Robotics
The scalability of URBAN-SIM has been demonstrated through experiments. As the number of environments increased from 1 to 256, the FPS scaled significantly from 100 to 2,620 fps, with GPU memory usage growing only slightly. Furthermore, increasing the number of training scenes from 1 to 1,024 remarkably improved the success rate from 5.1% to 83.2%. This robust scalability means robots can be trained on an infinite number of diverse scenes with any number of GPUs, paving the way for more effective and safer autonomous micromobility in urban spaces.


