Hu, Haiyan (2025) A Lightweight SLR System based on MobileNetV3, BiLSTM, and Attention Mechanism. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (4MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (321kB) | Preview |
Abstract
This project seeks to develop a system that recognizes American Sign Language (ASL) word level videos. I begin by using MobileNetV3-large to extract image features before using a BiLSTM(Bidirectional Long Short Term Memory Network) to model temporal information. Furthermore, I also add an attention mechanism to bring better attention to the key frames in the video to improve the recognition. Methods such as image augmentation and label smoothing were used when testing the model to make it more generalizable and stable. In general, the sum of the project was developed and validated based on the WLASL-300 dataset. Ultimately, I got good recognition results while keeping the model structure relatively simple which aligned with the project’s main focus: to adapt an efficient model structure and optimize the training strategy to build a base for further applications in the future.
Actions (login required)
![]() |
View Item |
Tools
Tools