Bridging Communication Gaps: A New Hybrid AI System for Real-Time Sign Language Recognition

TLDR: Researchers developed an automatic sign language recognition system using a hybrid CNN-LSTM deep learning architecture and Mediapipe for gesture keypoint extraction. The Python-based system provides real-time translation with 92% average accuracy, showing strong performance for distinct gestures but facing challenges with visually similar ones. It holds significant potential for applications in healthcare, education, and public services to enhance accessibility for deaf communities.

Sign languages are fundamental for communication within deaf communities worldwide, yet they often face marginalization, which can severely limit access to crucial services like healthcare and education. Addressing this challenge, a new study introduces an innovative automatic sign language recognition system designed to bridge these communication gaps.

The research, led by Takouchouang Fraisse Sacré from the Department of Computer Science at the National University of Vietnam, proposes a system built upon a hybrid architecture combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. A key component of this system is the use of Mediapipe, a Google library, for the precise extraction of keypoints from gestures.

The primary goal of this project is to develop a system capable of automatically translating sign gestures into text or voice in real-time. This is achieved by leveraging recent advancements in computer vision and deep learning.

How the System Works

At its core, the system utilizes Mediapipe to capture detailed information about a person’s movements. For every image captured by a camera, Mediapipe identifies:

21 keypoints for each hand, mapping articulations and fingertips.
468 keypoints for the face, defining contours and expressions.
33 keypoints for the body, indicating the position of major joints.

These spatial coordinates (x, y, z) provide a rich representation of gestures, encompassing not only hand configurations but also crucial facial expressions and body movements integral to sign languages.

The extracted keypoint data then feeds into a hybrid CNN-LSTM model. The CNN component is responsible for extracting spatial features from the captured images, using multiple convolutional layers to identify patterns at different scales. Following this, the LSTM component models the temporal dependencies of gestures over time. LSTMs are particularly effective for sequence recognition, capable of remembering information over long periods and adapting to variations in gesture execution speed. This powerful combination allows the system to analyze both the spatial appearance and the temporal flow of gestures for highly accurate recognition.

Development and Performance

The system was developed using Python, with TensorFlow for building and training the deep learning model, Mediapipe for keypoint detection, Streamlit for an interactive, real-time user interface, and OpenCV for image processing and camera access.

During testing, the model achieved an impressive average accuracy of 92%, with a recall of 89% and an F1-score of 90.5%. It demonstrated very strong performance for distinct gestures, such as “Hello” and “Thank you.” However, the study also highlighted that some confusions still occur for visually similar gestures, like “Call” and “Yes.” Despite these challenges, the model shows a good balance between performance and computational efficiency, offering competitive accuracy with lower inference times compared to some other state-of-the-art approaches.

Also Read:

Future Directions and Applications

While the results are promising, the researchers acknowledge limitations, including difficulties with similar gestures, dependence on lighting conditions, and variations in gesture execution among individuals. Future improvements will focus on expanding data augmentation, exploring more advanced neural network architectures like Transformers, refining data preprocessing techniques, and integrating contextual and linguistic information to recognize complete sentences.

The potential applications of this system are vast and impactful:

Healthcare: Facilitating communication between deaf patients and medical staff, thereby improving access to care and consultation quality.
Education: Serving as a pedagogical tool for learning sign languages for both deaf and hearing individuals, and aiding the integration of deaf students in mainstream education.
Public Services: Enhancing accessibility in public administrations, transport, and other services, contributing to a more inclusive society.

The complete implementation of the system, including preprocessing scripts, training code, and the Streamlit user interface, is openly available on GitHub, allowing for reproduction and further development. You can find more details about this research paper here: Automatic Sign Language Recognition: A Hybrid CNN-LSTM Approach Based on Mediapipe.

This project marks a significant step towards making communication more accessible for deaf communities, with ongoing work aimed at enhancing the model’s robustness and expanding its recognized vocabulary for diverse real-world applications.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Bridging Communication Gaps: A New Hybrid AI System for Real-Time Sign Language Recognition

How the System Works

Development and Performance

Future Directions and Applications

Gen AI News and Updates

EBU Academy’s School of AI Honored with European Digital Skills Award for Upskilling Media Professionals

New Jersey Educators Navigate the Integration of AI in Classrooms with Caution and Optimism

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates