NORMA eResearch @NCI Library

Phishing Detection and Prevention Using Natural Language Processing (NLP)

Kokkalakonda, Manideep (2025) Phishing Detection and Prevention Using Natural Language Processing (NLP). Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (2MB) | Preview

Abstract

Phishing remains one of the most prevalent cybersecurity threats, with email serving as the primary attack vector. Traditional detection methods—such as blacklists, keyword matching, and rule-based systems—struggle to detect evolving attacks that mimic legitimate communication and exploit subtle social engineering tactics. This research presents a hybrid phishing email detection model that integrates BERT-based semantic embeddings with structural and stylistic features, processed through a BiLSTM architecture to capture both contextual meaning and sequential dependencies.

The study compiles a unified dataset from multiple publicly available phishing and legitimate email sources, totaling 164,971 emails (85,781 phishing and 79,190 legitimate), and applies comprehensive NLP preprocessing, and extracts a 782-dimensional feature vector combining 768 BERT embedding dimensions with 14 handcrafted features (e.g., special character ratio, URL count, money-related terms).

The proposed hybrid model is evaluated against traditional TF-IDF-based machine learning baselines, including Logistic Regression, Random Forest, and Gradient Boosting. Experimental results demonstrate high accuracy (95.7%), strong recall for phishing detection (96.35%), and excellent ROC-AUC performance (0.9916).

Explainable AI techniques (SHAP, LIME) provide feature-level insights, revealing that urgency keywords, monetary terms, and special character usage are key phishing indicators. The model offers a scalable, interpretable, and enterprise-ready framework for real-time phishing detection.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Aleburu, Joel
UNSPECIFIED
Subjects: Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
H Social Sciences > HV Social pathology. Social and public welfare > Criminology > Crimes and Offences > Cyber Crime
Divisions: School of Computing > Master of Science in Cyber Security
Depositing User: Ciara O'Brien
Date Deposited: 15 Jun 2026 14:24
Last Modified: 15 Jun 2026 14:24
URI: https://norma.ncirl.ie/id/eprint/9357

Actions (login required)

View Item View Item