Adaptive Grasping for Robotic Prosthetics: An Edge-Cloud Solution

TLDR: The paper introduces Grasp-HGN, a novel approach to improve robotic prosthetic hand control by enabling them to grasp previously unseen objects. It defines “semantic projection” for this generalization capability and proposes Grasp-LLaVA, a vision-language model that uses human-like reasoning for grasp estimation, achieving 50.2% accuracy on unseen objects. To overcome latency issues, Grasp-HGN employs a hybrid edge-cloud infrastructure, combining a fast edge model with an accurate cloud model, dynamically switching between them based on confidence. This system significantly boosts accuracy and speed, and a new “User Upsetness Index” shows improved user experience in real-world scenarios.

For individuals with transradial amputations, robotic prosthetic hands hold immense promise for regaining the ability to perform daily activities. However, a significant challenge remains: current grasp models struggle to adapt to the vast variety of objects encountered in the real world, especially those not included in their training datasets. This limitation severely impacts users’ independence and quality of life.

A recent research paper, Grasp-HGN: Grasping the Unexpected, addresses this critical issue by introducing innovative solutions to enhance the robustness and generalizability of prosthetic hand control.

Understanding the Challenge: Semantic Projection

The researchers define a crucial concept called ‘semantic projection.’ This refers to a model’s ability to generalize to entirely new, unseen object types. They found that conventional models, despite achieving high accuracy (around 80%) on familiar objects during training, perform poorly—dropping to as low as 15% accuracy—when faced with objects they haven’t encountered before. This highlights a fundamental gap in how these models understand and apply grasping logic beyond their predefined datasets.

Introducing Grasp-LLaVA: Human-like Reasoning for Grasping

To overcome this limitation, the paper proposes Grasp-LLaVA, a Grasp Vision Language Model. Inspired by how humans reason, Grasp-LLaVA infers the most suitable grasp type based on an object’s physical characteristics, such as its shape and size. By leveraging a Vision Language Model (VLM) and incorporating text-based reasoning during its training, Grasp-LLaVA significantly improves accuracy on unseen object types, achieving an impressive 50.2% accuracy compared to 36.7% for state-of-the-art grasp estimation models. This marks a substantial step towards real-world applicability.

Bridging the Performance-Latency Gap with Hybrid Grasp Network (HGN)

While Grasp-LLaVA offers superior accuracy, its large size and computational demands pose a challenge for deployment on compact, power-limited edge devices typically found in prosthetics. To address this ‘performance-latency gap,’ the researchers introduce the Hybrid Grasp Network (HGN).

HGN is an intelligent edge-cloud deployment infrastructure. It combines a fast, specialized model running on the edge device (like a small computer within the prosthetic hand) with the highly accurate Grasp-LLaVA deployed in the cloud. An HGN controller dynamically decides whether to use the quick edge model’s prediction or offload the task to the more powerful cloud model as a fail-safe, based on the edge model’s confidence in its prediction. This dynamic switching mechanism is enhanced by ‘confidence calibration,’ ensuring the edge model’s confidence scores are reliable.

Also Read:

Real-World Performance and User Experience

The results demonstrate HGN’s effectiveness. With confidence calibration, HGN improves semantic projection accuracy by 5.6% (to 42.3%) while achieving a 3.5 times speedup over unseen object types. In a real-world scenario with a mix of seen and unseen objects, HGN reaches an average accuracy of 86% (a 12.2% gain over an edge-only model) and is 2.2 times faster than Grasp-LLaVA alone.

To evaluate the system from a user’s perspective, the researchers introduced the ‘User Upsetness Index’ (UUI). This metric quantifies user dissatisfaction by penalizing incorrect or delayed grasp decisions. HGN, particularly when calibrated, significantly reduces the UUI, indicating a more satisfactory and reliable user experience. For instance, with an optimal configuration, HGN (DC) achieves an overall accuracy of 86%, an average latency of 117.8 milliseconds, and a UUI of 1.12, showcasing a balanced improvement across accuracy, speed, and user satisfaction.

This research lays a strong foundation for developing prosthetic hands that can truly ‘grasp the unexpected,’ bringing us closer to highly functional and adaptable robotic prosthetics for daily living.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Adaptive Grasping for Robotic Prosthetics: An Edge-Cloud Solution

Understanding the Challenge: Semantic Projection

Introducing Grasp-LLaVA: Human-like Reasoning for Grasping

Bridging the Performance-Latency Gap with Hybrid Grasp Network (HGN)

Real-World Performance and User Experience

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates