NORMA eResearch @NCI Library

Towards improved phishing detection from URLs, using supervised machine learning

Aboki, Musa Idisere (2022) Towards improved phishing detection from URLs, using supervised machine learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (826kB) | Preview


It’s still amazing and bemusing how phishing as a social engineering technique has been successfully employed by fraudsters to deceive human users into giving up sensitive and confidential information like email password, bank account numbers or pin numbers, which are later used to defraud individuals and organisations. To combat and defend these forms of attacks often involves organisations using a multi-layered defence mechanism which uses technology solutions that leverages artificial intelligence or machine learning. Defence also involves user awareness training to assist users in knowing the techniques the attackers use. This paper seeks to add to the large and growing body of research to improve the effectiveness of phishing detection by the use of novel machine learning (ML) approach of supervised machine learning classification and algorithms. The Machine learning models used in this paper are Decision Tree-hierarchical structure, Random Forest, Multi-layered Perceptions-Linear (MLP), XGBoost, Autoencoder Neural Network (AE) Support Vector Machines (SVM). The paper would also explore some ensemble approaches to explore if the results can be improved. An ensemble is an ML approach that attempts to improve prediction accuracy through a combination of some of the algorithms mentioned above.

The paper intends to develop a system that uses machine learning techniques to classify websites based on their URLs and tested with datasets from phishing and legitímate URLs. The end result would determine which machine learning model best detects phishing URLs. Suggestions to further improve the results will also be tested and discussed. The paper concludes with additional recommendations for improving detection accuracy.

Item Type: Thesis (Masters)
Spelman, Ross
Uncontrolled Keywords: Security; Phishing; Spear phishing; Blacklists; Google Safe Browsing (GSB); PhishTank (PT); OpenPhish (OP); Support Vector Machine; Decision Tree; Random Forest; XGBoost; Classifier based Associative Classification Support Vector Machine (SVM); K-Nearest Neighbour (KNN); Logistic regression (LR); C4.5 generates a decision tree algorithm; Anti-Phishing Working Group (APWG)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites
Divisions: School of Computing > Master of Science in Cyber Security
Depositing User: Tamara Malone
Date Deposited: 21 Apr 2023 16:52
Last Modified: 21 Apr 2023 16:52

Actions (login required)

View Item View Item