NORMA eResearch @NCI Library

Text Summarization using Sequence to Sequence

-, Ramandeep Singh (2023) Text Summarization using Sequence to Sequence. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (1MB) | Preview

Abstract

The data has been available on the internet, which is massive in numbers. Hence, it becomes difficult for a reader to read long articles or blogs as he needs to go through all the text for understanding. Text summarization makes life easier for an individual by using Natural Language Processing(NLP), which has been designed for the researcher’s community to get large information in a short time. Text document reduces the size of the source document and represents only the key in- formation without changing the text’s actual meaning. The researchers have shown their research in extractive abstraction, but now the researcher approach has been broadened and they transform into abstractive methods. In this research paper, a novel approach has been used by combining extractive and abstractive methods. The extractive method has been implemented, including word frequency-based sentence feature extraction by using a graph-based TextRank algorithm. On the other hand, the deep artificial neural network approach has been involved in the abstractive phase, consisting of a sequence-to-sequence encoder-decoder model, which is a neural network of Long short-term memory (LSTM). The three approaches have been followed, sequence to sequence using LSTM encoder and decoder, sequence to sequence using attention mechanism, and text summarization using BERT, GPT-2, and NLTK. ROUGE metrics have measured their accuracies. In the results, the LSTM has achieved the highest accuracy on the news dataset by ROUGE-1(0.40), ROUGE-2(0.09), and ROUGE-L(0.39), which resulted in generating a concise summary without changing the original meaning of a text.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Razzaq, Abdul
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 24 May 2023 18:38
Last Modified: 24 May 2023 18:38
URI: https://norma.ncirl.ie/id/eprint/6640

Actions (login required)

View Item View Item