NORMA eResearch @NCI Library

Evaluating the Sensitivity of Machine Learning Algorithms to Training Data Size in OS X and Memory Malware Detection

Tamidala, Devika (2024) Evaluating the Sensitivity of Machine Learning Algorithms to Training Data Size in OS X and Memory Malware Detection. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (803kB) | Preview

Abstract

Malware detection is an important factor in cybersecurity as the number of complex attacks on OS X and memory-based systems continues to rise. Due to the increasing use of ML techniques, the effect of training data size on detection accuracy and time complexity is still an open issue. This work focuses on the problem of choosing reliable ML models for malware detection in scenarios with limited resources, especially training data. Three Machine Learning algorithms, namely, Logistic Regression (LR), K-Nearest Neighbors (KNN), and Gaussian Naive Bayes (GNB) have been considered in the present research, for performance assessment on two popular benchmark datasets of OS X and memory malware, namely the OS X Malware Dataset and CIC-MalMem-2022. Thus, sensitivity to the proportion of training data (10%, 20%, 50%, 80%, and 90%) is estimated, as well as accuracy, precision, recall, F1-score, and time to train each model. The findings show that memory malware detection has the lowest sensitivity to data size, while OS X malware detection is more sensitive, with LR giving the best results at larger datasets. The research also reveals that GNB is the most computationally efficient for both types of malwares. This research aims at identifying appropriate algorithms for real-time analysis and efficient use of resources in the detection of malware.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Tomer, Vikas
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 05 Sep 2025 11:29
Last Modified: 05 Sep 2025 11:29
URI: https://norma.ncirl.ie/id/eprint/8823

Actions (login required)

View Item View Item