TLDR: A new research paper demonstrates a multi-class human/object detection system for robot manipulators using proprioceptive sensors. The system, tested on a Franka Emika Panda robot, can distinguish between human, PVC, and aluminum contacts with 91.11% accuracy in real-time. This advancement, utilizing LSTM, GRU, and Transformer models with a sliding window preprocessing approach, significantly improves robot safety and interaction capabilities in collaborative environments.
In the evolving world of robotics, particularly in settings where humans and robots work closely together, ensuring safety and effective interaction is paramount. This is especially true in physical human-robot collaboration (pHRC), where direct contact can occur. A new research paper, Multi-Class Human/Object Detection on Robot Manipulators using Proprioceptive Sensing, by Justin Hehli, Marco Heiniger, Maryam Rezayati, and Hans Wernher van de Venn, addresses a critical aspect of this challenge: enabling robots to accurately identify what they are touching.
Traditionally, robots have used binary classifiers to distinguish between ‘soft’ and ‘hard’ objects, often assuming soft contacts are with humans. While useful, this approach has limitations. For instance, some parts of the human body can be quite rigid, leading to misclassification. Conversely, soft non-human objects like rubber might be mistaken for a human, preventing the robot from performing its intended task. Furthermore, a single ‘hard’ category limits the robot’s ability to differentiate between various non-human objects, which could be crucial for more sophisticated actions.
Advancing Beyond Binary Classification
This study introduces a significant improvement by developing and evaluating three-class human/object detection models. These models are designed to distinguish between humans, soft objects (specifically PVC), and hard objects (specifically aluminum). This more detailed classification allows for safer and more intelligent robot responses in shared workspaces. The research exclusively uses proprioceptive sensor data – information from the robot’s own joints like torque, position error, and velocity error – as input, mimicking a form of tactile perception.
How the Models Were Developed
The researchers collected a comprehensive dataset using a Franka Emika Panda robot manipulator. The robot interacted with a dummy arm (representing a human), a rigid PVC tube, and aluminum profiles. Data was recorded at a high sampling rate of 200 Hz, capturing the robot’s internal sensor readings during contact events. To enhance the dataset’s variability, different object locations and contact points on the robot were used.
Three types of machine learning models were trained and evaluated: Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Transformer models. These are all well-suited for analyzing time-series data, which is what the robot’s sensor readings represent.
Key Preprocessing and Performance Insights
A crucial aspect of the study involved exploring different data preprocessing strategies. The team found that a ‘sliding window’ approach, where data is segmented into multiple, overlapping windows around contact events, significantly outperformed a ‘fixed window’ approach. This method provides more data and better captures patterns over time, leading to higher accuracy. Additionally, the study investigated ‘majority voting’ techniques, where multiple individual predictions are aggregated to improve stability. Hard majority voting, where each prediction contributes one vote, generally yielded slightly better results than soft voting.
Real-World Application and Results
The best-performing model, a Transformer, was tested in real-time on the Franka Emika Panda robot. It achieved an impressive overall accuracy of 91.11%, closely matching the offline validation results. This demonstrates the practical feasibility of deploying such multi-class detection models in real-world robotic environments.
Breaking down the performance by class, the model achieved perfect recall (100.00%) for the human class, meaning it correctly identified all instances of human contact. This is particularly important for safety. While its precision for PVC was high (96.00%), its recall was slightly lower (80.00%), indicating some PVC instances were missed. The aluminum class had the lowest precision (84.85%) but a high recall (93.33%). These insights provide valuable directions for future improvements.
Also Read:
- HannesImitation: Advancing Prosthetic Hand Control Through AI Learning
- Enhancing Robot Precision: A New Approach to Overcome Feature Collapse in Diffusion Policies
Future Directions
The researchers acknowledge that while this work serves as a strong proof-of-concept, future efforts should focus on improving the model’s robustness and generalization capabilities. This could involve collecting larger and more varied datasets, exploring techniques like ‘sim-to-real’ domain adaptation (training in simulation and fine-tuning on real data), and potentially expanding the number and types of classes the robot can distinguish. The ultimate goal is to enable robots to accurately analyze arbitrary motions and contact locations, further enhancing safety and functionality in human-robot collaboration.


