Cognitive Captions: Empowering Images with AI-Generated Descriptions

Rasal, Apoorva Vishwas

Cognitive Captions: Empowering Images with AI-Generated Descriptions

Tools

Rasal, Apoorva Vishwas (2024) Cognitive Captions: Empowering Images with AI-Generated Descriptions. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (2MB) \| Preview
Preview	PDF (Configuration Manual) Download (1MB) \| Preview

Abstract

The generation of captions from images is a challenging task that integrates computer vision and natural language processing (NLP). In this research, we delve into multimodal models that incorporate visual and textual attention mechanisms to optimize image caption generation for complex scenes with multiple objects. It bridges the gap between visual recognition and language generation by using convolutional neural networks (CNNs) and recurrent neural networks (RNNs), paying specific attention to how attention positions affect caption qualities. The main goal of this study was to construct a better multimodal model that combined visual and textual attention mechanisms, evaluate its performance, and examine the influence of attention mechanism positions on caption quality. In conducting the research, datasets such as MSCOCO and Flickr30k/8k were used while techniques like transfer learning and fine-tuning were employed. The model was trained using a CNN to extract features and a bidirectional LSTM for sequence generation with the help of attention mechanisms. Evaluation metrics used consisted of BLEU and METEOR scores. The proposed model showed remarkable improvements in producing consistent and contextually appropriate captions. Dual attention mechanisms were effective in improving caption quality, resulting in BLEU scores from 0.5429 to 0.8307 and METEOR scores between 0.1724 and 0.9835. The integration of visual attention mechanism with textual one is key to high-quality image captioning. Future work should explore larger datasets and advanced techniques like Generative Adversarial Networks (GANs), address biases, improve diversity, accuracy of captions while considering practical applications including aiding visually impaired people and improving content management systems.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Rifai, Hicham UNSPECIFIED
Subjects:	Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence > Computer vision Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence > Computer vision
Divisions:	School of Computing > Master of Science in Data Analytics
Depositing User:	Ciara O'Brien
Date Deposited:	25 Aug 2025 10:31
Last Modified:	25 Aug 2025 10:31
URI:	https://norma.ncirl.ie/id/eprint/8617

Actions (login required)

View Item