NORMA eResearch @NCI Library

A Hybrid Approach for Detecting Phishing Mails Using Textual, Content, and URL Analysis with Ensemble Learning

Asha Shajilal, Bejoy (2024) A Hybrid Approach for Detecting Phishing Mails Using Textual, Content, and URL Analysis with Ensemble Learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (910kB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (294kB) | Preview

Abstract

Phishing is an important threat to the field of cybersecurity, with attackers constantly developing tactics to avoid detection systems. The study addresses the difficulty by investigating a hybrid approach to phishing detection that combines URL analysis, textual analysis, and content-based analysis with ensemble learning approaches. The major goal is to create a robust detection model that improves accuracy while minimising false positives and false negatives, hence enhancing the detection of phishing emails.

This study's data came from the Enron Corpus for legitimate emails and the Figshare-curated Nigerian dataset for phishing emails. These datasets produced key features such as BERT embeddings for textual content and numerous indications derived from HTML and URL analysis. The study used machine learning models such as Decision Tree(DT),K-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Random Forest, which were then integrated with Stacking and Soft Voting ensemble approaches.

While individual models such as SVM and Random Forest performed well, ensemble techniques demonstrated better balanced performance across evaluation metrics. The Stacking Ensemble, in particular, displayed the capacity to combine the strengths of textual and content-based features, earning 96.18% accuracy and an F1-Score of 0.7846.

The findings indicate that, while a hybrid method is helpful, it still requires additional development, particularly in terms of improving ensemble techniques to better capture the complex nature of phishing emails. This study adds to the ongoing development of improved phishing detection systems and establishes the framework for future research into improving real-time email filtering systems in dynamic cybersecurity contexts.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Heffernan, Niall
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > Electronic Mail
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > Electronic Mail
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Cyber Security
Depositing User: Ciara O'Brien
Date Deposited: 29 Jul 2025 10:20
Last Modified: 29 Jul 2025 10:20
URI: https://norma.ncirl.ie/id/eprint/8292

Actions (login required)

View Item View Item