TLDR: The research introduces ‘Moving Out,’ a new benchmark for physically-grounded human-AI collaboration in a 2D physics simulation. It features diverse physical objects and collaboration scenarios, and two tasks to evaluate AI’s ability to adapt to varied human behaviors and unseen physical constraints. The paper proposes BASS (Behavior Augmentation, Simulation, and Selection), a novel method that significantly improves AI’s performance in human-AI collaboration by augmenting training data and simulating action outcomes. Experiments and user studies demonstrate BASS’s superior adaptability and understanding of physical interactions compared to existing models.
In the evolving landscape of artificial intelligence, a significant challenge lies in enabling AI systems to effectively collaborate with humans in the physical world. Unlike digital environments, the real world introduces complexities such as continuous movements, varied object properties, and unpredictable human behaviors. To address this, researchers from the University of Virginia have introduced a new benchmark called Moving Out, designed to foster more realistic and physically-grounded human-AI collaboration.
Introducing Moving Out: A New Benchmark for Physical Collaboration
Inspired by the popular video game, Moving Out is a 2D physics simulation environment where two agents must work together to transport various objects to designated goal regions. This environment is unique because it incorporates realistic physical attributes like object shapes (stars, polygons, circles), sizes (small, medium, large), and mass, all of which influence how agents interact with them. Walls introduce friction, and goal regions require precise spatial planning, making collaboration essential.
The benchmark features 12 distinct maps categorized into three collaboration modes:
- Coordination: Maps with narrow passages that force agents to pass items or move aside for each other.
- Awareness: Scenarios where agents must decide when and how to assist their partner, adapting to their behavior.
- Action Consistency: Tasks requiring synchronized and consistent actions, such as rotating large items through tight spaces.
Two Key Challenges for AI Collaboration
The researchers designed two primary tasks to evaluate AI models:
- Adapting to Diverse Human Behaviors: This task assesses an AI’s ability to learn from varied human-human interaction data and adapt to different human partners in a continuous environment. Over 1,000 human-human demonstrations were collected for this purpose.
- Generalizing to Unseen Physical Constraints: This task tests if an AI can understand and adapt to new physical properties of objects (like different masses or shapes) that it hasn’t encountered during training. Expert human demonstrations were used to create a dataset with randomized object attributes.
BASS: A Novel Approach to Enhance Collaboration
To tackle these challenges, the researchers proposed a new method called BASS (Behavior Augmentation, Simulation, and Selection). BASS works in two main ways:
- Collaboration Behavior Augmentation: During training, BASS enhances the diversity of the dataset. It does this by subtly perturbing the partner’s movements and by recombining segments of different human-human trajectories. This exposes the AI to a wider range of valid collaborative scenarios, making it more robust to variations in human behavior.
- Simulation and Action Selection: BASS trains a ‘dynamics model’ that can predict the outcome of an action in the physical environment. At runtime, the AI can generate several potential actions, simulate their future states using this model, and then select the action that leads to the most favorable outcome, such as moving objects closer to the goal. This allows the AI to make informed decisions even without a direct simulator, crucial for real-world applications.
Promising Results and Human Feedback
Experiments showed that BASS significantly outperformed existing state-of-the-art models, including MLP, GRU, Diffusion Policy, and MAPPO, in both AI-AI and human-AI collaboration settings. BASS achieved higher task completion rates and reduced waiting times, indicating its superior ability to adapt to human partners and understand physical constraints.
A user study with human participants further validated BASS’s effectiveness. Humans rated BASS as significantly more ‘Helpful’ and demonstrated a better ‘Understanding of Physics’ compared to other models. This suggests that BASS is not only effective in terms of metrics but also provides a more intuitive and supportive collaboration experience for humans.
While BASS marks a significant step forward, the researchers acknowledge limitations, such as the inference speed of generative models for real-time interaction and the need to cover even more diverse physical interactions in future work. The project page for Moving Out is available for further exploration. You can find the full research paper here: Moving Out: Physically-grounded Human-AI Collaboration.
Also Read:
- CausalStep: A New Benchmark Reveals AI’s Struggle with Step-by-Step Video Reasoning
- Defining Movable Objects in Vision Systems with SpelkeNet
Future Directions
The work on Moving Out and BASS opens doors for future research in human-AI collaboration. This includes improving the real-time performance of AI models, leveraging advanced AI capabilities like large language models for reasoning in physical tasks, and exploring even more complex multi-agent and human-robot interaction dynamics.


