NORMA eResearch @NCI Library

Bidirectional LSTM approach to image captioning with scene features

Agughalam, Davis Munachimso (2020) Bidirectional LSTM approach to image captioning with scene features. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (4MB) | Preview


Generating sentence descriptions for images is an area of research combining computer vision and natural language processing. More recently, it has been driven by encoder decoder deep learning approaches where visual features are learned with a convolutional neural network (CNN) encoder are passed to a long short-term memory (LSTM) decoder for language generation. One major challenge in this approach is bridging the modality gap between the image and text data to enhance the semantic correctness of the generated sentences. While researchers have explored different features to achieve this, scene exploratory features have been largely underutilised and where utilised have been deployed with unidirectional LSTM decoders limited to retaining only past information thus producing poor results for long sequences. This research adopts a novel approach leveraging scene information deployed with a bidirectional LSTM decoder to achieve more semantically correct image descriptions. Pretrained CNNs Inceptionv3 and Places365 are employed for object and scene image feature extractions respectively before a bidirectional LSTM decoder is employed for language translation. This approach is validated by conducting experiments using the Flickr8k benchmark dataset and the results show improved performance compared to other encoder-decoder methods using just global image features thereby outlining the complementary advantages of scene information and bidirectional LSTMs to image captioning tasks.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 18 Jan 2021 16:05
Last Modified: 18 Jan 2021 16:05

Actions (login required)

View Item View Item