NORMA eResearch @NCI Library

Advancements in Automated Image Captioning: A Comparative Study of Modern AI Models

Patil, Shivakumar (2023) Advancements in Automated Image Captioning: A Comparative Study of Modern AI Models. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (728kB) | Preview

Abstract

The study presents a comprehensive study of full-sentence caption generation methods covering the overlap between visual content and natural language processing. Focused on Flickr dataset, study aims to explore recent approaches and compare 3 advanced methodologies including the combination of VGG-16 with LSTM, Vision Transformer (ViT) with GPT-2 and OpenAI’s Contrastive Language–Image Pretraining (CLIP). Each approach is evaluated for its effectiveness in producing coherent and contextually relevant captions using BLEU-1 and BLEU-2 scores serving as the primary evaluation metrics and human evaluation. Additionally project briefly further studies potential NLP applications including trending generation, word based image search, translation and audio conversion. Eventually, this project aims to contribute this this latest evolving field of auto caption generation showcasing the capability and limitations of current approaches for future advancements in integrating visual and linguistic data processing and exploring potential use cases for these captions generated.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Chikkankod, Arjun
UNSPECIFIED
Uncontrolled Keywords: Automated Image Captioning; VGG16 and LSTM; Vision Transformer (ViT) and GPT-2; CLIP (Contrastive Language–Image Pretraining); BLEU Scores; Natural Language Processing (NLP)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence > Computer vision
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence > Computer vision
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 20 May 2025 13:20
Last Modified: 20 May 2025 13:20
URI: https://norma.ncirl.ie/id/eprint/7586

Actions (login required)

View Item View Item