NORMA eResearch @NCI Library

A Deep Learning Emotion Classification Framework for Low Resource Languages

-, Manisha (2023) A Deep Learning Emotion Classification Framework for Low Resource Languages. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (983kB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (946kB) | Preview

Abstract

Emotion classification from text is the process of identifying and classifying emotions expressed in textual data. Emotions can be feelings such as anger, joy, suspense, sadness, neutral, and so on. Developing a machine learning model to identify emotions in a low-resourced language with a limited set of linguistic resources and annotated corpora is a challenge. This research proposes a Deep Learning Emotion Classification Framework to identify emotions in low-resourced languages such as Hindi. An annotated corpus of Hindi short stories consisting of 20,304 sentences is used to train the models for predicting five categories of emotions: anger, joy, suspense, sadness, and neutral/plain talk. To resolve the class imbalance in the dataset SMOTE technique is applied. The framework leverages fine-tuning pre-trained models, mBERT, IndicBERT, and a hybrid model, mBERT+BiLSTM. In addition, multiple baseline machine learning and deep learning models such as SVM, Logistic Regression, Random Forest, CNN, BiLSTM, and CNN+BiLSTM are experimented with in the research. The results of the models are evaluated based on macro average recall, macro average precision, and macro average F1 score. The hybrid model mBERT+BiLSTM performed best in the experiment with a test accuracy of 57%.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Stynes, Paul
UNSPECIFIED
Clifford, William
UNSPECIFIED
McLaughlin, Eugene
UNSPECIFIED
Uncontrolled Keywords: deep learning; emotion classification; low resource languages; pre-trained model; transfer learning
Subjects: P Language and Literature > PK Indo-Iranian
Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 06 Nov 2024 17:59
Last Modified: 06 Nov 2024 17:59
URI: https://norma.ncirl.ie/id/eprint/7161

Actions (login required)

View Item View Item