NORMA eResearch @NCI Library

SumBot: An enhanced multilingual Document Summarization using LLMs

Anish, Girish (2024) SumBot: An enhanced multilingual Document Summarization using LLMs. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (491kB) | Preview

Abstract

In a time where knowledge is available in excess, both written and spoken, summarising is an especially useful ability. Long texts are condensed into clear, comprehensive formats by summarization, which facilitates efficient communication and decision-making. This problem is addressed by automated document summarising, which uses Large Language Models (LLMs) and Natural Language Processing (NLP) to extract pertinent information from texts. Using extractive or abstractive approaches, this procedure identifies important words or concepts, preserving the main ideas of a document while eliminating unnecessary details. A unique hybrid framework called SumBot was created especially for the field of scientific literature to facilitate multidocument scientific summarization (MDSS). To produce high-quality summaries, this framework makes use of several Sentence Transformers and models from the T5 family. To adequately summarise entire material, the research focuses on analysing various kinds of LLMs and considering diverse document styles and languages. The study intends to improve automated summarization's accuracy and efficiency by analysing these models' performance, making it a useful tool for managing massive amounts of data in a variety of scenarios. This method helps better decision-making processes in a variety of disciplines and enhances information retrieval.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Raza Abidi, Syed Muhammad
UNSPECIFIED
Uncontrolled Keywords: Large Language Models; Hugging Face; ChatGPT; Unsupervised extractive summarization; Prompt Engineering
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 18 Aug 2025 15:15
Last Modified: 18 Aug 2025 15:15
URI: https://norma.ncirl.ie/id/eprint/8570

Actions (login required)

View Item View Item