Gandhi, Shriya (2021) Classifying the Insincere Questions using Transfer Learning. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration manual)
Download (1MB) | Preview |
Abstract
Hate speech and insincere content on social media and online communication forums are digitalized forms of personal attacks. Such content if left unattended, tamper the decorum of the forum and lead to a lack of trust by the users. Manual screening of content posted online is tedious and psychologically harmful for the people reviewing these posts. Developing a robust and scalable model to detect such content automatically is a pressing priority. This research project proposes using pre-trained language representation model based on transformer architecture to identify the insincere questions posted on Quora. The dataset for research work is extracted from the Kaggle data repository. To limit the use of high computational power, which is otherwise required for NLP problems, we have created three samples of data and trained the transformer-based BERT and XLNET models. Due to high imbalance in the dataset, macro f1-score is considered as the metric for model performance evaluation. The results show that both BERT and XLNET outperform the baseline model, logistic regression. Amongst BERT and XLNET, the XLNET model achieves a higher macro-f1 score and weighted f1-score of 0.84 and 0.96, respectively.
Item Type: | Thesis (Masters) |
---|---|
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software T Technology > T Technology (General) > Information Technology > Computer software Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Clara Chan |
Date Deposited: | 29 Nov 2021 11:06 |
Last Modified: | 29 Nov 2021 11:06 |
URI: | https://norma.ncirl.ie/id/eprint/5151 |
Actions (login required)
View Item |