1. Introduction
Communication barriers faced by the hearing-impaired community highlight the need for effective sign language recognition systems. While several sign language recognition systems exist globally, Nepali Sign Language (NSL) lacks a comprehensive dataset and established deep learning recognition models. This study aims to bridge this gap by developing a dedicated NSL dataset and implementing deep learning models for accurate character recognition. The primary models used in this article are AlexNet and MobileNetV2.
2. Related Work
Existing research on sign language recognition has explored various techniques, including Hidden Markov Models and Support Vector Machines, with a recent shift towards deep learning architectures, particularly Convolutional Neural Networks. Datasets like ASL and ISL have facilitated significant advancements, yet specific datasets for Nepali Sign Language remain underdeveloped. Previous studies often face challenges such as gesture variability, lighting conditions, and the need for large, diverse datasets, issues this work seeks to mitigate for NSL.
3. Methodology
This research involved the creation of NSLNet-18, a new dataset comprising 27,000 images representing 18 static NSL characters, collected from 10 native speakers. Images were resized to 224x224 pixels and subjected to extensive data augmentation, including rotation, flipping, brightness adjustments, and zooming, to enhance dataset diversity. Two CNN architectures, AlexNet and MobileNetV2, were trained on this dataset using the Adam optimizer with specific learning rates and batch sizes over 50 epochs.
4. Experimental Results
The experimental evaluation demonstrated high performance from both deep learning models on the NSLNet-18 dataset. AlexNet achieved a remarkable test accuracy of 99.8%, while MobileNetV2 recorded a strong test accuracy of 99.6%. The models exhibited consistently low validation loss and high validation accuracy throughout the training process, indicating robust learning and generalization capabilities. These results confirm the effectiveness of CNNs in accurately recognizing static Nepali Sign Language characters.
The following table summarizes the key performance metrics of the trained deep learning models:
| Model | Training Accuracy | Validation Accuracy | Test Accuracy |
|---|---|---|---|
| AlexNet | 99.9% | 99.8% | 99.8% |
| MobileNetV2 | 99.7% | 99.6% | 99.6% |
5. Discussion
The achieved high accuracies of 99.8% for AlexNet and 99.6% for MobileNetV2 underscore the significant potential of deep learning for static Nepali Sign Language character recognition. The successful development and utilization of the NSLNet-18 dataset establish a crucial foundation for future research in this underrepresented domain. These results suggest that the proposed framework can significantly improve communication for the hearing-impaired community by enabling highly accurate automated NSL interpretation. Future work could extend to dynamic gesture recognition and expanding the dataset to cover a wider range of NSL vocabulary.