Anish, Girish (2024) SumBot: An enhanced multilingual Document Summarization using LLMs. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (491kB) | Preview |
Abstract
In a time where knowledge is available in excess, both written and spoken, summarising is an especially useful ability. Long texts are condensed into clear, comprehensive formats by summarization, which facilitates efficient communication and decision-making. This problem is addressed by automated document summarising, which uses Large Language Models (LLMs) and Natural Language Processing (NLP) to extract pertinent information from texts. Using extractive or abstractive approaches, this procedure identifies important words or concepts, preserving the main ideas of a document while eliminating unnecessary details. A unique hybrid framework called SumBot was created especially for the field of scientific literature to facilitate multidocument scientific summarization (MDSS). To produce high-quality summaries, this framework makes use of several Sentence Transformers and models from the T5 family. To adequately summarise entire material, the research focuses on analysing various kinds of LLMs and considering diverse document styles and languages. The study intends to improve automated summarization's accuracy and efficiency by analysing these models' performance, making it a useful tool for managing massive amounts of data in a variety of scenarios. This method helps better decision-making processes in a variety of disciplines and enhances information retrieval.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Raza Abidi, Syed Muhammad UNSPECIFIED |
Uncontrolled Keywords: | Large Language Models; Hugging Face; ChatGPT; Unsupervised extractive summarization; Prompt Engineering |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Ciara O'Brien |
Date Deposited: | 18 Aug 2025 15:15 |
Last Modified: | 18 Aug 2025 15:15 |
URI: | https://norma.ncirl.ie/id/eprint/8570 |
Actions (login required)
![]() |
View Item |