Nepali Sign Language Characters Recognition: Dataset Development and Deep Learning Approaches

Niroj Maharjan Aamod Ghimire Milan Thapa Sandeep Thapa Ashish Shrestha
Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Nepal

Abstract

This paper addresses the critical absence of a standardized Nepali Sign Language (NSL) dataset by introducing NSLNet-18, a novel dataset for 18 static NSL characters. Deep learning approaches, specifically AlexNet and MobileNetV2, were employed to develop a robust character recognition system. Data augmentation techniques were applied to enhance model generalization and performance. The developed models achieved high recognition accuracy, demonstrating the efficacy of deep learning for NSL character recognition.

Keywords

Nepali Sign Language, Sign Language Recognition, Deep Learning, Convolutional Neural Networks, Dataset Development


1. Introduction

Communication barriers faced by the hearing-impaired community highlight the need for effective sign language recognition systems. While several sign language recognition systems exist globally, Nepali Sign Language (NSL) lacks a comprehensive dataset and established deep learning recognition models. This study aims to bridge this gap by developing a dedicated NSL dataset and implementing deep learning models for accurate character recognition. The primary models used in this article are AlexNet and MobileNetV2.

2. Related Work

Existing research on sign language recognition has explored various techniques, including Hidden Markov Models and Support Vector Machines, with a recent shift towards deep learning architectures, particularly Convolutional Neural Networks. Datasets like ASL and ISL have facilitated significant advancements, yet specific datasets for Nepali Sign Language remain underdeveloped. Previous studies often face challenges such as gesture variability, lighting conditions, and the need for large, diverse datasets, issues this work seeks to mitigate for NSL.

3. Methodology

This research involved the creation of NSLNet-18, a new dataset comprising 27,000 images representing 18 static NSL characters, collected from 10 native speakers. Images were resized to 224x224 pixels and subjected to extensive data augmentation, including rotation, flipping, brightness adjustments, and zooming, to enhance dataset diversity. Two CNN architectures, AlexNet and MobileNetV2, were trained on this dataset using the Adam optimizer with specific learning rates and batch sizes over 50 epochs.

4. Experimental Results

The experimental evaluation demonstrated high performance from both deep learning models on the NSLNet-18 dataset. AlexNet achieved a remarkable test accuracy of 99.8%, while MobileNetV2 recorded a strong test accuracy of 99.6%. The models exhibited consistently low validation loss and high validation accuracy throughout the training process, indicating robust learning and generalization capabilities. These results confirm the effectiveness of CNNs in accurately recognizing static Nepali Sign Language characters.

The following table summarizes the key performance metrics of the trained deep learning models:

Model Training Accuracy Validation Accuracy Test Accuracy
AlexNet 99.9% 99.8% 99.8%
MobileNetV2 99.7% 99.6% 99.6%
The table illustrates that both AlexNet and MobileNetV2 achieved exceptionally high accuracy across training, validation, and test sets. AlexNet slightly outperformed MobileNetV2 in all metrics, securing a 99.8% test accuracy, highlighting its strong discriminative power on the developed NSL dataset. These consistent high scores validate the robustness and effectiveness of the proposed deep learning approaches for Nepali Sign Language character recognition.

5. Discussion

The achieved high accuracies of 99.8% for AlexNet and 99.6% for MobileNetV2 underscore the significant potential of deep learning for static Nepali Sign Language character recognition. The successful development and utilization of the NSLNet-18 dataset establish a crucial foundation for future research in this underrepresented domain. These results suggest that the proposed framework can significantly improve communication for the hearing-impaired community by enabling highly accurate automated NSL interpretation. Future work could extend to dynamic gesture recognition and expanding the dataset to cover a wider range of NSL vocabulary.