NORMA eResearch @NCI Library

Enhancing Zero-Day Malware Detection in Enterprise Networks Using Behaviour-Based Machine Learning Models

Quraishi, Farhanahmad Jahidahmad (2024) Enhancing Zero-Day Malware Detection in Enterprise Networks Using Behaviour-Based Machine Learning Models. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (569kB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (1MB) | Preview

Abstract

In the field of cybersecurity and machine learning, zero-day malware poses a significant threat to enterprise networks because of its ability to counter any in-place security and exploit unknown vulnerabilities. This eventually renders the traditional signature-based detection methods very ineffective and so this research study presents a complete framework which can allow the zero-day malware detection enhancement. This enhancement is done by combining behavior-based analysis with advanced machine learning techniques. Utilizing the EMBER dataset, a large-scale collection of labeled Windows Portable Executable (PE) files, both static and behavior-based features were extracted such that internal patterns of the malwares were captured and understood. This study involves a powerful methodology with extensive data preprocessing, feature engineering through dimensionality reduction techniques like Principal Component Analysis (PCA), and the various machine learning models. These machine learning models included Random Forest, Gradient Boosting, XGBoost, and LightGBM classifiers. There was one more attempt of combining these ML models i.e. an autoencoder-based anomaly detection mechanism was used to identify, slight or major, deviations from normal behavior which ended up enhancing the variants’ detection of unseen malware. These ML models were evaluated using a combination of traditional performance metrics, accuracy, precision, recall, F1-score, but also the cybersecurity-specific evaluation measures such as Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR). The Random Forest classifier emerged as the best-performing model. The model showed extraordinary results against common malware evasion techniques i.e. code obfuscation and packing. This was primarily because of the addition of the behavior-based features which have been known to be less susceptible to static code changes. The hybrid model used in this study, which combined the anomaly detection with classification, had improved recall but also showed a minor increase in the false positives. This meant that the need for balance in detection sensitivity was important. Furthermore, to cater the practical application of this research, this study’s implementation also included the development of a malware detection API and a simple intrusion detection system (IDS). This study also had some limitations which included the exclusion of dynamic features because of resource constraints and simulated MTTD and MTTR eval metrics, but it also showed that the results had potential for real-world deployment.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Lugones, Diego
UNSPECIFIED
Uncontrolled Keywords: Zero-Day Malware Detection; Behavior-Based Analysis; Machine Learning; Random Forest Classifier; Anomaly Detection; Enterprise Network Security; EMBER Dataset; Cybersecurity
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Cyber Security
Depositing User: Ciara O'Brien
Date Deposited: 28 Jul 2025 09:24
Last Modified: 28 Jul 2025 09:24
URI: https://norma.ncirl.ie/id/eprint/8245

Actions (login required)

View Item View Item