Teaching VLA Models to Control Soft Robots for Safer Human Environments

TLDR: This research demonstrates the first successful deployment of Vision-Language-Action (VLA) models on a soft continuum robot. While VLA models typically control rigid robots, this study introduces a finetuning pipeline that allows state-of-the-art VLA models like OpenVLA-OFT and π0 to effectively control a soft robot named Embuddy. The findings show that out-of-the-box VLA policies fail on soft robots due to physical differences, but with targeted finetuning, the soft robot achieves performance comparable to rigid counterparts in manipulation and human-interactive tasks. This work highlights the potential for combining VLA intelligence with the inherent safety and flexibility of soft robots for human-centered environments.

Robotic systems are increasingly becoming a part of our daily lives, from manufacturing to assistance in homes. For these robots to truly integrate into human-centered environments, they need to be safe, adaptable, and capable of understanding complex instructions. This is where Vision-Language-Action (VLA) models come into play, offering a way for robots to perceive, understand language, and execute actions through a single intelligent system.

Historically, the deployment of VLA models has been largely confined to rigid, industrial-style robotic arms. While these arms offer predictable control, their inherent rigidity can pose safety concerns and limit their adaptability when interacting closely with humans or in unstructured environments. This research explores a crucial step forward: deploying VLA models on soft continuum manipulators, which are designed with compliant structures that deform upon interaction, offering intrinsic safety and resilience.

The core challenge lies in the significant difference, or ’embodiment gap,’ between rigid and soft robots. Policies trained on rigid arms often fail when applied directly to soft robots due to their non-linear, underactuated dynamics and unique morphology. This paper introduces a structured finetuning and deployment pipeline to bridge this gap, enabling VLA models to effectively control soft robots.

The researchers utilized a custom-designed soft continuum robot named Embuddy for their experiments. Embuddy features three modular sections, each with a revolute joint followed by a tendon-driven soft continuum segment made from 3D-printed Thermoplastic Polyurethane (TPU). Its underactuated sections and lightweight design (totaling 5kg) contribute to inherently safe interactions, as the arm remains deformable to external forces.

Two state-of-the-art VLA models, OpenVLA-OFT and π0, were evaluated. The study involved three representative manipulation tasks: “Put the orange in the plate” (simple pick-and-place), “Put the X in the plate” (pick-and-place with choices, where X could be orange or milk), and “Feed the person with marshmallow” (a close human-interactive task).

Initial experiments confirmed that out-of-the-box VLA policies, without any specific training for soft robots, consistently failed. This was primarily due to the models generating motions suitable for rigid manipulators, which were incompatible with Embuddy’s unique kinematics and maximum bending angle constraints. This highlights the critical need for adaptation when transferring VLA intelligence to new robotic embodiments.

However, through targeted finetuning using a small, custom dataset of soft robot demonstrations, the adapted VLA policies achieved remarkable success. The finetuned OpenVLA-OFT model, for instance, achieved the exact same success rates on Task 1 and Task 2 as its rigid counterpart (a UR5 robot), demonstrating that the finetuning strategy successfully bridged the rigid-to-soft domain gap. While π0 also achieved high success rates after finetuning, OpenVLA-OFT showed superior performance on the compliant platform.

The research also delved into the robustness of these systems. The VLA models proved robust to human presence in the scene, maintaining focus on the workspace. Furthermore, Embuddy demonstrated impressive resilience during human interaction; when manually pushed away during an inference task, the soft robot was able to recover its original pose, continue its trajectory, and successfully complete the task without performance degradation.

Also Read:

This work marks a significant milestone, presenting the first systematic deployment of VLA models on a soft continuum robot. It unequivocally demonstrates that by addressing the embodiment mismatch through targeted finetuning, the advanced reasoning capabilities of VLA models can be effectively combined with the intrinsic safety and flexibility of soft robotics. This opens up a promising avenue for developing safe, adaptable, and intelligent embodied AI agents that can operate seamlessly in human-shared environments. For more details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Teaching VLA Models to Control Soft Robots for Safer Human Environments

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates