Agughalam, Davis Munachimso (2020) Bidirectional LSTM approach to image captioning with scene features. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (2MB) | Preview |
Preview |
PDF (Configuration manual)
Download (4MB) | Preview |
Abstract
Generating sentence descriptions for images is an area of research combining computer vision and natural language processing. More recently, it has been driven by encoder decoder deep learning approaches where visual features are learned with a convolutional neural network (CNN) encoder are passed to a long short-term memory (LSTM) decoder for language generation. One major challenge in this approach is bridging the modality gap between the image and text data to enhance the semantic correctness of the generated sentences. While researchers have explored different features to achieve this, scene exploratory features have been largely underutilised and where utilised have been deployed with unidirectional LSTM decoders limited to retaining only past information thus producing poor results for long sequences. This research adopts a novel approach leveraging scene information deployed with a bidirectional LSTM decoder to achieve more semantically correct image descriptions. Pretrained CNNs Inceptionv3 and Places365 are employed for object and scene image feature extractions respectively before a bidirectional LSTM decoder is employed for language translation. This approach is validated by conducting experiments using the Flickr8k benchmark dataset and the results show improved performance compared to other encoder-decoder methods using just global image features thereby outlining the complementary advantages of scene information and bidirectional LSTMs to image captioning tasks.
Item Type: | Thesis (Masters) |
---|---|
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software T Technology > T Technology (General) > Information Technology > Computer software |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Dan English |
Date Deposited: | 18 Jan 2021 16:05 |
Last Modified: | 18 Jan 2021 16:05 |
URI: | https://norma.ncirl.ie/id/eprint/4381 |
Actions (login required)
View Item |