NORMA eResearch @NCI Library

Identifying Emotions for Code Mixed Hindi–English Tweets

Sonu, Sanket (2022) Identifying Emotions for Code Mixed Hindi–English Tweets. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (1MB) | Preview


Social media is getting bigger day by day. Billions of people use social media on daily basis. Billions of posts are posted every day on Twitter, Facebook, Instagram, etc. which have billions of comments on them. These posts and comments show the emotion of the user. Many companies use these data to find hidden insights about their products by analysing the emotions of the user. Detecting emotions out of monolingual texts such as English texts is easy to process because of a wide variety of pre-trained models introduced by Google, Facebook, etc. However, when trying to detect emotions for Code Mixed Hindi–English texts are complex, and not much research has been proposed. These bilingual Code Mixed Hindi–English texts are a mixture of 2 languages such as English and Hindi, nowadays user also uses English alphabets to write Hindi words. There is no spelling checker or supported library for processing transliterate Hindi words, which results in less accuracy by any machine or deep learning models. This project is using Twitter’s data that has been extracted using the official Tweepy API released by Twitter. The research paper will use the different supervised machine and deep learning models for predicting 7 emotions which are ’Happy’, ’Sad’, ’Angry’, ’Fear’, ’Disgust’, ’Surprise’, or ’No emotions’. This research will use the various supervised machine and deep learning models such as SVC, Multinomial Naive Bayes, Logistic Regression, Random Forest, CNN, and LSTM. This study will also propose a few easy and effective methods to clean, and pre-process Code Mixed Hindi–English texts for corpus creation which will provide the effective result when machine and deep learning models are trained using this corpus. The SVC model performed best by providing 73.75% accuracy.

Item Type: Thesis (Masters)
Uncontrolled Keywords: SVM; Logistic Regression; Naive Bayes; Random Forest; Convolutional Neural Network (CNN); Long Short-Term Memory (LSTM)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
B Philosophy. Psychology. Religion > Psychology > Emotions
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 13 Mar 2023 13:03
Last Modified: 13 Mar 2023 13:03

Actions (login required)

View Item View Item