Pioneering AI for Nepali Sign Language Recognition

TLDR: This research introduces the first benchmark dataset for Nepali Sign Language (NSL), comprising 36 gesture classes with 1,500 samples each, collected with plain and random backgrounds. It evaluates deep learning models (MobileNetV2 and ResNet50) using transfer learning and fine-tuning for NSL character recognition. MobileNetV2 achieved a higher classification accuracy of 90.45% compared to ResNet50’s 88.78%, demonstrating its effectiveness in low-resource settings. The study also proposes a real-time video-based recognition system, laying a foundation for assistive technologies for NSL users.

Communication is a fundamental human right, yet for individuals with hearing and speech impairments, especially in regions with under-resourced sign languages, it presents significant challenges. In Nepal, where tens of thousands rely on Nepali Sign Language (NSL) for daily communication, there has been a notable absence of digital linguistic datasets and computational tools to support its recognition and use.

A recent study titled “Nepali Sign Language Characters Recognition: Dataset Development and Deep Learning Approaches” addresses this critical gap. Authored by Birat Poudel, Satyam Ghimire, Sijan Bhattarai, Saurav Bhandari, and Suramya Sharma Dahal, this research introduces the first-ever benchmark dataset for NSL, paving the way for advanced assistive technologies and further research in this vital area. You can read the full paper here: Nepali Sign Language Characters Recognition: Dataset Development and Deep Learning Approaches.

Building the Foundation: The NSL Dataset

The cornerstone of this research is the creation of a custom dataset specifically designed for Nepali Sign Language character recognition. This comprehensive dataset features 36 distinct NSL gesture classes, with each class containing 1,500 samples. To ensure the models could perform well in various real-world scenarios, the images were collected under two different background conditions:

Plain Background: 1,000 images per character against uniform, clean backgrounds for controlled learning.
Random Background: 500 images per character against varied, realistic backgrounds to enhance model robustness.

This dual-background approach resulted in a substantial dataset of 54,000 images, providing rich and diverse training data for deep learning models. The data was preprocessed into TensorFlow’s TFRecord format for optimized performance.

Leveraging Deep Learning for Recognition

To evaluate the recognition performance on their new dataset, the researchers employed two popular pre-trained Convolutional Neural Network (CNN) architectures: MobileNetV2 and ResNet50. These models, initially trained on the vast ImageNet dataset, were adapted for the 36-class NSL classification task using a technique called transfer learning and fine-tuning.

The training process involved a progressive two-phase strategy:

Phase 1 (Frozen Base Model Training): The core convolutional layers of the pre-trained models were kept frozen, and only the newly added classification layers were trained. This allowed the models to quickly learn the specific features of NSL characters.
Phase 2 (Partial Fine-Tuning): Selected deeper layers of the base models were unfrozen and trained with a reduced learning rate. This fine-tuning step enabled the models to adapt more precisely to the nuances of NSL gestures while retaining the powerful representations learned from ImageNet.

Both phases used the Adam optimizer and Sparse Categorical Cross Entropy as the loss function, with specific learning rates and batch sizes defined for optimal training.

Key Findings and Performance

The evaluation revealed compelling results. MobileNetV2 consistently outperformed ResNet50 in recognizing Nepali Sign Language characters. MobileNetV2 achieved a classification accuracy of 90.45%, while ResNet50 reached 88.78%. This outcome is particularly significant because MobileNetV2 is a lightweight architecture with fewer parameters compared to the deeper ResNet50.

The researchers suggest that MobileNetV2’s efficiency in capturing localized spatial and structural features, crucial for distinguishing hand gestures, made it more effective in this low-resource setting. Its design helps reduce the risk of overfitting on medium-scale datasets like the NSL dataset. In contrast, ResNet50’s deeper architecture, while powerful for highly complex datasets, might have extracted redundant features that didn’t contribute as effectively to classifying the relatively simpler gesture images, potentially leading to reduced generalization.

The system also incorporates a robust real-time recognition pipeline. It takes a continuous video stream of hand gestures, samples frames, preprocesses them, and classifies them. A sliding window with majority voting ensures stable and accurate recognition of gestures, even during transitions.

Also Read:

Looking Ahead

This study marks a significant milestone in the field of Nepali Sign Language recognition. By providing the first benchmark dataset and demonstrating the effectiveness of deep learning models, particularly MobileNetV2, it lays a strong foundation for future advancements. The researchers propose several avenues for future work, including expanding the dataset with more gesture classes and samples, exploring more advanced neural network architectures, optimizing the system for mobile and edge devices, and incorporating additional modalities like facial expressions or body posture to enhance recognition accuracy.

This pioneering effort not only contributes valuable resources for NSL but also highlights the potential of transfer learning and fine-tuning to advance research in other under-explored sign languages worldwide.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Pioneering AI for Nepali Sign Language Recognition

Building the Foundation: The NSL Dataset

Leveraging Deep Learning for Recognition

Key Findings and Performance

Looking Ahead

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

BRYGE AI Secures Silver Stevie® Award for Groundbreaking Health Tech Product for Women

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates