NORMA eResearch @NCI Library

Sentiment Analysis in Tamil-English Code-Mixed Data Using Hybrid Deep Learning Techniques

Muthuraj, Diwakar (2024) Sentiment Analysis in Tamil-English Code-Mixed Data Using Hybrid Deep Learning Techniques. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (3MB) | Preview

Abstract

Sentiment Analysis (SA) is the process of classifying the sentiments found in data as positive, negative, or neutral. Some of the important real-world applications of sentiment analysis are social media trends, customer feedback, political discourse, and market insights. With this vast application, it has a major role in Natural Language Processing. Looking at most social media posts, comments, ecommerce product reviews, and online forums, the bilingual communities largely use code-mixed text, which is a frequent interchange of words from different languages to express their opinions. This code-mixed text has non-standard grammar, transliterations, slang words; thus, these complexities introduce challenges in sentiment analysis. Tamil-English being one of the most used code-mixes is chosen for this research project to examine fine-tuning hyperparameters of hybrid models to efficiently classify sentiment in Tamil-English code-mixed data. In this project, various experiments are done with a base model mBERT+TextGCN, with different tools and techniques to prepare the data for the model. These steps include preprocessing, handling class imbalance, feature engineering, feature extraction etc. Then to improve the efficiency of proposed IndicBART+TextGCN further, fine-tuning of hyperparameters are performed and evaluated using accuracy, precision, recall, F1 Score and confusion matrix. By following these effective techniques, the IndicBART+TextGCN model achieved a weighted average of precision 0.71, recall 0.68, f1-score 0.67. This result shows that the preprocessing, handling class imbalance, feature engineering and efficient fine-tuning of IndicBART+TextGCN has improved this hybrid model’s ability to classify sentiments from the Tamil-English code-mixed data.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Anant, Aaloka
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 03 Sep 2025 14:28
Last Modified: 03 Sep 2025 14:28
URI: https://norma.ncirl.ie/id/eprint/8753

Actions (login required)

View Item View Item