-, Sonia (2023) Eyes Through Words: Providing Independence to Visually Impaired with Photo-to-Text Technology. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (2MB) | Preview |
Preview |
PDF (Configuration manual)
Download (5MB) | Preview |
Abstract
Captioning images automatically is a difficult problem that spans both computer vision and natural language processing. Recent attention has been spurred by the importance of this job in applications like as self-driving cars, helping the visually handicapped, and improving the detection of malicious activities. Captioning photos of cricket matches is of particular interest in this research. Approximately a thousand pictures of cricketers were gathered and labelled with an average of two captions each. We have personally collected cricketer images and generated associated labels, making this dataset a valuable contribution to our research. The created model for cricketer picture captioning was compared against the popular Flickr dataset, which consists of 8,000 photographs with five captions each. The study used a Long Short-Term Memory (LSTM)-based Recurrent Neural Network (RNN) to process textual captions, and a pre-trained InceptionV3 model to extract picture data without the classification layer. An image feature extractor, sequence processor, and decoder make up the model's architecture. The accuracy of the predicted captions was compared to the accuracy of the reference captions using BLEU ratings. Both BLEU-1 (uni-gram scoring) and BLEU-2 (bi-gram scoring) were used in the model selection process. Caption loading, caption prediction for new photos, and other implementation issues are discussed in length. While the cricket captioning model performed admirably on cricketer-related photos, it struggled when applied to unrelated images. However, on photographs of cricketers, it beat a model trained using data from Flickr. In contrast, the Flickr-trained model performed exceptionally well over a wide range of domains since it was built on a more comprehensive dataset.
Actions (login required)
View Item |