Asha Shajilal, Bejoy (2024) A Hybrid Approach for Detecting Phishing Mails Using Textual, Content, and URL Analysis with Ensemble Learning. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (910kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (294kB) | Preview |
Abstract
Phishing is an important threat to the field of cybersecurity, with attackers constantly developing tactics to avoid detection systems. The study addresses the difficulty by investigating a hybrid approach to phishing detection that combines URL analysis, textual analysis, and content-based analysis with ensemble learning approaches. The major goal is to create a robust detection model that improves accuracy while minimising false positives and false negatives, hence enhancing the detection of phishing emails.
This study's data came from the Enron Corpus for legitimate emails and the Figshare-curated Nigerian dataset for phishing emails. These datasets produced key features such as BERT embeddings for textual content and numerous indications derived from HTML and URL analysis. The study used machine learning models such as Decision Tree(DT),K-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Random Forest, which were then integrated with Stacking and Soft Voting ensemble approaches.
While individual models such as SVM and Random Forest performed well, ensemble techniques demonstrated better balanced performance across evaluation metrics. The Stacking Ensemble, in particular, displayed the capacity to combine the strengths of textual and content-based features, earning 96.18% accuracy and an F1-Score of 0.7846.
The findings indicate that, while a hybrid method is helpful, it still requires additional development, particularly in terms of improving ensemble techniques to better capture the complex nature of phishing emails. This study adds to the ongoing development of improved phishing detection systems and establishes the framework for future research into improving real-time email filtering systems in dynamic cybersecurity contexts.
Actions (login required)
![]() |
View Item |