NORMA eResearch @NCI Library

A Distributed Intrusion Detection System for Computer Networks using Hybrid Model on Apache Spark Framework

Deenadayal, Harshitha (2020) A Distributed Intrusion Detection System for Computer Networks using Hybrid Model on Apache Spark Framework. Masters thesis, Dublin, National College of Ireland.

[img]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[img]
Preview
PDF (Configuration manual)
Download (2MB) | Preview

Abstract

Intrusion detection system (IDS) is the major aspect of security solutions for every organization relying on internet services as well as for, internet providers who offers internet services. IDSs aids proactive monitoring of the network in order to detect potential threats and protect organizations from possible data and financial loss. Due to its high importance in real world, it has been a major interest for research studies. IDSs adopts data mining techniques to analyze historical network- connection information to detect attacks. The efficiency of IDSs are measured in terms of the accuracy of prediction and its error rate. It is also important that models perform in near real-time to detect threat before the control of network resources is compromised. This creates a requirement for IDS to be built on high computing environment. This paper experiments on well-performing classification algorithms namely: support vector machine (SVM), decision tree, random forest, gradient boosting and an ensemble of all these four algorithms. Correlation-based feature engineering techniques namely ARM (Association Rule Mining) and Variable importance are applied to enhance the performance. The performance of these models are measured and compared in terms of accuracy and error rate. Additionally, this paper conducts experiments on both centralized and distributed (Apache Hadoop and Spark) environments to compare the time taken to train the data mining models for IDS. The experimental results shows that tree-based algorithms are better classifiers for IDSs and models are trained mush faster in distributed (Hadoop and Spark framework) environments in comparison to centralized.
Keywords: Intrusion Detection system, SVM, Decision Tree, Random Forest, Gradient Boosting, Apache Spark, NSL-KDD dataset.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science

Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 16 Jun 2020 09:56
Last Modified: 16 Jun 2020 09:56
URI: http://norma.ncirl.ie/id/eprint/4288

Actions (login required)

View Item View Item