NORMA eResearch @NCI Library

Spam Detection in Short Message Service Using Natural Language Processing and Machine Learning Techniques

Ora, Anchal (2020) Spam Detection in Short Message Service Using Natural Language Processing and Machine Learning Techniques. Masters thesis, Dublin, National College of Ireland.

[img]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[img]
Preview
PDF (Configuration manual)
Download (4MB) | Preview

Abstract

As the usage of mobile phones increased, the use of Short Message Service increased significantly. Due to the lower costs of text messages, people started using it for promotional purposes and unethical activities. This resulted in the ratio of spam messages increasing exponentially and thereby loss of personal and financial data. To prevent data loss, it is crucial to detect spam messages as quick as possible. Thus, the research aims to classify spam messages not only efficiently but also with low latency. Different machine learning models like XGBoost, LightGBM, Bernoulli Naïve Bayes that are proven to be very fast with low time complexity have been implemented in the research. The length of the messages was taken as an additional feature, and the features were extracted using Unigram, Bigram and TF-IDF matrix. Chi Square feature selection was implemented to further reduce the space complexity. The results showcased that Bernoulli Naïve Bayes followed by LightGBM with the TF IDF matrix generated the highest accuracy of 96.5% in 0.157 seconds and 95.4% in 1.708 seconds respectively.
Keywords: Spam SMS, Text Classification, Natural Language Processing, Machine Learning, Bernoulli Naïve Bayes, LightGBM, XGBoost

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science

Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software

Q Science > QA Mathematics > Computer software > Mobile Phone Applications
T Technology > T Technology (General) > Information Technology > Computer software > Mobile Phone Applications
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 15 Jun 2020 12:27
Last Modified: 15 Jun 2020 12:27
URI: http://norma.ncirl.ie/id/eprint/4286

Actions (login required)

View Item View Item