NORMA eResearch @NCI Library

Classification of Toxic Comments using Knowledge Distillation

Gupta, Bijender (2022) Classification of Toxic Comments using Knowledge Distillation. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (1MB) | Preview


Any expression that attacks a group or individual based on qualities such as color of skin, sexuality, religion, nationality, or another aspect is considered toxic. It is also described as any form of communication (verbal, textual, or gestural) intended to elicit unpleasant sentiments in a group or person, which may result in acts of aggression, ego, or damage to society as a whole. To address this issue, machine learning and deep learning neural network models were utilized to efficiently classify harmful information on internet platforms. Building these models on massive datasets is challenging because it necessitates a significant amount of computing resources, and inference budgeting remains a challenge.We propose a knowledge distillation approach for categorizing toxic comments. Knowledge distillation is the process of distilling a model by instructing a smaller network, one step at a time, exactly what a larger trained network should do. The feature mappings produced by the bigger network after each convolution layer are referred to as ”soft labels.” By attempting to replicate the larger network’s outputs at each level, the smaller network is eventually trained to mimic the larger network’s activity. We begin by training a specifically developed teacher model, Bert-Base. The knowledge is then transferred from this teacher-model to Mobile-BERT (Sun et al.; 2020). It is a slimmed down form of BERT-LARGE, along with bottleneck design and a carefully crafted blend of self-attention and feed-forward-networks. After applying knowledge distillation method, we show that MobileBert is 60% quicker and significantly smaller in size than the Bert-Base model with accuracy of 98%, Precision 92% and Recall 88%. In terms of performance parameters such as precision, recall, and accuracy score, MobileBert outperformed the Bert-Base model, and it took less training time and inference time. We also demonstrate that our student model, MobileBert, outperforms standard ML algorithms such as Logistic-Regression, Decision-Tree Classifier,Random-Forest-Classifier, and XG-Boost.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HV Social pathology. Social and public welfare > Discrimination
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 26 Jan 2023 15:19
Last Modified: 03 Mar 2023 11:26

Actions (login required)

View Item View Item