Surampudi, Srikari (2024) A Comparative study of ML Models for Data Loss Prevention. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (746kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (1MB) | Preview |
Abstract
Data Loss Prevention (DLP) is vital for the protection of exclusive information for organizations against leakage and unauthorized access. A key limitation of conventional DLP systems is their inability to effectively identify complex data loss events amidst the vast amount of cyber threat data. The following work focuses on comparing different Machine Learning (ML) models for DLP solutions. Utilizing a comprehensive dataset comprising 40,000 records of network traffic and attack characteristics, we implemented and evaluated ML models like SVM, K-Means clustering, Random forests, Logistic Regression and K-Neural networks. The data preprocessing steps involved were feature cleaning such as missing value handling, categorical encoding, feature creation and synthetic data augmentation by SMOTE technique. Further, data augmentation was carried out through adding Gaussian noise to achieve better generalization architecture. The assessment, indicated that the Random Forest model was far more effective than the other models we investigated, including the SVM, K-means clustering, Logistic Regression model and the Neural Networks model; after we hyperparameter tuned the Random Forest model, its accuracy was 87.0 % while that of the other models was approximately 34 % for the same features. The parameters of the model also reflected Random Forest’s high level of accuracy: the ROC-AUC score was 0.97, hence the model excels at distinguishing between various classes of data loss incidents. These results further apply ensemble learning methods as valuable in the improvement of DLP systems and providing solid foundation from which to detect data loss attempts.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Khan, Imran UNSPECIFIED |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software > Computer Security T Technology > T Technology (General) > Information Technology > Computer software > Computer Security K Law > KDK Republic of Ireland > Data Protection Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Cyber Security |
Depositing User: | Ciara O'Brien |
Date Deposited: | 28 Jul 2025 11:42 |
Last Modified: | 28 Jul 2025 11:42 |
URI: | https://norma.ncirl.ie/id/eprint/8265 |
Actions (login required)
![]() |
View Item |