Pande, Rutuja Anil (2023) Abstractive-Based Multilingual Text Summarization and Sentimental Analysis using NLP Techniques. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (4MB) | Preview |
Preview |
PDF (Configuration manual)
Download (4MB) | Preview |
Abstract
The study proposes generating text summarization from the news articles and performing sentimental analysis on those generated summaries to identify the semantics behind the sentences. These summaries are then translated to Hindi language to attract the non-english speakers from India.To implement the objectives proper research methodology is followed. Dataset used in this study are Indian News summary and BBC News Summary. Both the datasets are pre-processed using function to remove stopwords, punctuation, empty spaces from the text for better analysis. Basic Exploratory Data Analysis (EDA) is performed to understand the structure of data in detail. Models like Bidirectional and Auto-Regressive Transformers (BART), Bidirectional Encoder Representations from Transformers (BERT) and Google Translator are used for the objectives of this research. The summaries generated are of short lengths between 50 to 150 range. The RecallOriented Understudy for Gisting Evaluation (ROUGE) score for both the datasets. The ROUGE-L F1 score for first dataset is approximately 10%. ROUGE-1, ROUGE-2, ROUGE-L score are 40%, 20%, 40% respectively for second dataset. The scores are comparatively low due to the eliminations of outliers and filtered data. The translation is measured using Bilingual Evaluation Understudy (BLEU) score which falls close to 0. which indicates that the translation are not very good in quality. This can be due to the translation performed on the summaries generated from filtered data. The sentimental analysis performed on summaries produced output close to the sentiments of the text.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Makki, Ahmed UNSPECIFIED |
Subjects: | P Language and Literature > PK Indo-Iranian Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Tamara Malone |
Date Deposited: | 28 Dec 2024 11:28 |
Last Modified: | 28 Dec 2024 11:28 |
URI: | https://norma.ncirl.ie/id/eprint/7243 |
Actions (login required)
View Item |