Tamidala, Devika (2024) Evaluating the Sensitivity of Machine Learning Algorithms to Training Data Size in OS X and Memory Malware Detection. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (803kB) | Preview |
Abstract
Malware detection is an important factor in cybersecurity as the number of complex attacks on OS X and memory-based systems continues to rise. Due to the increasing use of ML techniques, the effect of training data size on detection accuracy and time complexity is still an open issue. This work focuses on the problem of choosing reliable ML models for malware detection in scenarios with limited resources, especially training data. Three Machine Learning algorithms, namely, Logistic Regression (LR), K-Nearest Neighbors (KNN), and Gaussian Naive Bayes (GNB) have been considered in the present research, for performance assessment on two popular benchmark datasets of OS X and memory malware, namely the OS X Malware Dataset and CIC-MalMem-2022. Thus, sensitivity to the proportion of training data (10%, 20%, 50%, 80%, and 90%) is estimated, as well as accuracy, precision, recall, F1-score, and time to train each model. The findings show that memory malware detection has the lowest sensitivity to data size, while OS X malware detection is more sensitive, with LR giving the best results at larger datasets. The research also reveals that GNB is the most computationally efficient for both types of malwares. This research aims at identifying appropriate algorithms for real-time analysis and efficient use of resources in the detection of malware.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Tomer, Vikas UNSPECIFIED |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software > Computer Security T Technology > T Technology (General) > Information Technology > Computer software > Computer Security Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Ciara O'Brien |
Date Deposited: | 05 Sep 2025 11:29 |
Last Modified: | 05 Sep 2025 11:29 |
URI: | https://norma.ncirl.ie/id/eprint/8823 |
Actions (login required)
![]() |
View Item |