NORMA eResearch @NCI Library

Toxic Question Classification in Question & Answer Forum Using Deep Learning

Sampath, Mathiazhagan (2019) Toxic Question Classification in Question & Answer Forum Using Deep Learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (5MB) | Preview

Abstract

In this internet era, Question and Answer forum are majorly used for knowledge sharing. With large amount of questions getting posted in Q&A forum, identifying the question is toxic and removing the question from forum is the major challenge. To maintain the trust and faith among forum users, there is a need to check if the content is toxic or not. This study proposes to build a Deep Learning model to classify the question based on the toxic content in the question. Quora Q&A dataset is used for this study with 1.30 million records. With imbalance in the dataset, F1 score is used as a metrics to evaluate the model. Two different Deep neural network models are built with Attention layer, learning rate hyper-parameter is selected by Cyclic Learning Rate. The performance of the model is found to be increasing by adding number of nodes in hidden layer. GPU need to be used for building the model with Bidirectional CuDNNLSTM and CuDNNGRU. Threshold value ranging from 0.01 to 1 is passed to find the maximum F1 score of the model. Model with high number of nodes in the hidden layer is successful in classifying the toxic content and highest F1 score of 0.9001 is achieved at Threshold of 0.40 with Attention layer, CuDNNLSTM and CuDNNGRU on comparing to model with less number of nodes and hidden layers.
Keywords: Natural Language Processing, CuDNNLSTM, CuDNNGRU, Attention, Cyclic Learning Rate, Q&A, Deep Learning, Recurrent Neural Network

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 17 Jun 2020 15:14
Last Modified: 17 Jun 2020 15:14
URI: https://norma.ncirl.ie/id/eprint/4300

Actions (login required)

View Item View Item