Patil, Shivakumar (2023) Advancements in Automated Image Captioning: A Comparative Study of Modern AI Models. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (2MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (728kB) | Preview |
Abstract
The study presents a comprehensive study of full-sentence caption generation methods covering the overlap between visual content and natural language processing. Focused on Flickr dataset, study aims to explore recent approaches and compare 3 advanced methodologies including the combination of VGG-16 with LSTM, Vision Transformer (ViT) with GPT-2 and OpenAI’s Contrastive Language–Image Pretraining (CLIP). Each approach is evaluated for its effectiveness in producing coherent and contextually relevant captions using BLEU-1 and BLEU-2 scores serving as the primary evaluation metrics and human evaluation. Additionally project briefly further studies potential NLP applications including trending generation, word based image search, translation and audio conversion. Eventually, this project aims to contribute this this latest evolving field of auto caption generation showcasing the capability and limitations of current approaches for future advancements in integrating visual and linguistic data processing and exploring potential use cases for these captions generated.
Actions (login required)
![]() |
View Item |