NORMA eResearch @NCI Library

Leveraging Machine Learning to Enhance Phishing URL Detection

Rathor, Manpritcour (2023) Leveraging Machine Learning to Enhance Phishing URL Detection. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (966kB) | Preview

Abstract

This study adopts a supervised learning approach, concentrating on the identification of phishing websites through a diverse range of machine learning techniques. The process encompasses acquiring, processing, and visualizing a comprehensive dataset containing 1000 URLs for each category (phishing and legitimate). Utilizing Python libraries like urllib and whois, 19 features including URL length and domain age are extracted. The dataset undergoes preprocessing, addressing null values, and transforming categorical data into a numerical format. Four machine learning models—Logistic Regression, AdaBoost Classifier, Gradient Boosting Classifier, and Stacking Classifier—are both trained and assessed using metrics like accuracy, precision, recall, and F1-score. The ISCXURL2016 dataset, encompassing 45,225 URLs, from which we used 1000 URLS for Phishing and 1000 for legitimate which ensures that the model is trained on a vast and varied dataset. Preprocessing involves managing null values, converting data to NumPy arrays, and employing correlation-based feature selection. The proposed phishing detection system encompasses webpage generation, feature extraction, and the training of machine learning models, with a 90:10 split for training and testing. Results highlight varying model performances, with the Stacking Classifier demonstrating notable accuracy and balance. Individual model experiments provide in-depth insights into their specific strengths and weaknesses.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Ayala-Rivera, Vanessa
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Cyber Security
Depositing User: Ciara O'Brien
Date Deposited: 22 Apr 2025 13:26
Last Modified: 22 Apr 2025 13:26
URI: https://norma.ncirl.ie/id/eprint/7459

Actions (login required)

View Item View Item