Rasal, Apoorva Vishwas (2024) Cognitive Captions: Empowering Images with AI-Generated Descriptions. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (2MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (1MB) | Preview |
Abstract
The generation of captions from images is a challenging task that integrates computer vision and natural language processing (NLP). In this research, we delve into multimodal models that incorporate visual and textual attention mechanisms to optimize image caption generation for complex scenes with multiple objects. It bridges the gap between visual recognition and language generation by using convolutional neural networks (CNNs) and recurrent neural networks (RNNs), paying specific attention to how attention positions affect caption qualities. The main goal of this study was to construct a better multimodal model that combined visual and textual attention mechanisms, evaluate its performance, and examine the influence of attention mechanism positions on caption quality. In conducting the research, datasets such as MSCOCO and Flickr30k/8k were used while techniques like transfer learning and fine-tuning were employed. The model was trained using a CNN to extract features and a bidirectional LSTM for sequence generation with the help of attention mechanisms. Evaluation metrics used consisted of BLEU and METEOR scores. The proposed model showed remarkable improvements in producing consistent and contextually appropriate captions. Dual attention mechanisms were effective in improving caption quality, resulting in BLEU scores from 0.5429 to 0.8307 and METEOR scores between 0.1724 and 0.9835. The integration of visual attention mechanism with textual one is key to high-quality image captioning. Future work should explore larger datasets and advanced techniques like Generative Adversarial Networks (GANs), address biases, improve diversity, accuracy of captions while considering practical applications including aiding visually impaired people and improving content management systems.
Actions (login required)
![]() |
View Item |