NORMA eResearch @NCI Library

Classifying the Insincere Questions using Transfer Learning

Gandhi, Shriya (2021) Classifying the Insincere Questions using Transfer Learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (1MB) | Preview

Abstract

Hate speech and insincere content on social media and online communication forums are digitalized forms of personal attacks. Such content if left unattended, tamper the decorum of the forum and lead to a lack of trust by the users. Manual screening of content posted online is tedious and psychologically harmful for the people reviewing these posts. Developing a robust and scalable model to detect such content automatically is a pressing priority. This research project proposes using pre-trained language representation model based on transformer architecture to identify the insincere questions posted on Quora. The dataset for research work is extracted from the Kaggle data repository. To limit the use of high computational power, which is otherwise required for NLP problems, we have created three samples of data and trained the transformer-based BERT and XLNET models. Due to high imbalance in the dataset, macro f1-score is considered as the metric for model performance evaluation. The results show that both BERT and XLNET outperform the baseline model, logistic regression. Amongst BERT and XLNET, the XLNET model achieves a higher macro-f1 score and weighted f1-score of 0.84 and 0.96, respectively.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Clara Chan
Date Deposited: 29 Nov 2021 11:06
Last Modified: 29 Nov 2021 11:06
URI: https://norma.ncirl.ie/id/eprint/5151

Actions (login required)

View Item View Item