NORMA eResearch @NCI Library

A Smart Cloud – Based Document Search Engine for Query Retrieval Using Large Learning Models (LLM's)

Konan Ravi, Rohith (2024) A Smart Cloud – Based Document Search Engine for Query Retrieval Using Large Learning Models (LLM's). Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (3MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (1MB) | Preview

Abstract

A set of documents grows rather fast; at the moment, there are more than 140 million documents, and this number increases every year, so, retrieving documents should be effective. Today’s issues are connected with the utilization of special language, relationships between documents, and imprecise queries uttered by users. Sophisticated NLP approaches such as semantic search and embeddings are central to fixing most these problems. This paper focuses on the possibility of populating text-to-text transformers such as Google T5 and BART Large models refinished for summarizing and retrieving purposes. Through fine-tuning, the authors observed improved performance in BART Large model with ROUGE-1 scores increasing from 0.269 to 0.461 and improved unigram overlap and context relevance. Moreover, the application of models such as Sentence Encoder and FastText demonstrated a near perfect of 98% and 96% of retrieval accuracy, respectively, which was more efficient than the traditional TF-IDF and Count Vectorizer models. Thus utilizing cloud-native architectures along with databases such as MySQL or FAISS, the system enables accurate and efficient document search on a large-scale. This research offers an ideal foundation for most contemporary semantic search architectures that answer user expectations of accuracy and value.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Siddig, Abubakr
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Cloud computing
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Divisions: School of Computing > Master of Science in Cloud Computing
Depositing User: Ciara O'Brien
Date Deposited: 15 Jul 2025 13:36
Last Modified: 15 Jul 2025 13:36
URI: https://norma.ncirl.ie/id/eprint/8114

Actions (login required)

View Item View Item