Evaluation of machine learning techniques for imbalanced data in IDS

Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give h...

Full description

Saved in:
Bibliographic Details
Main Author: Mokaramian, Shahram
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/37080/5/ShahramMokaramianMFSKSM2013.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give high overall accuracy. However, they favor on majority class when dealing with imbalanced data. Unfortunately, IDS deals with highly imbalanced data distribution and most machine learning algorithms have poor detection on R2L and U2R classes, which include malicious attacks. Therefore, it requires a resampling technique to balance the data. The purpose of this study is to investigate performance of three machine learning algorithms which are Support Vector Machine (SVM), Decision Tree (DT) and Fuzzy Classifier (FC) for imbalanced data in IDS and after the rebalanced the data which was achieved using Synthetic Minority Over-sampling TEchnique (SOMTE). The performance of the three machine learning algorithms was evaluated with the new rebalanced data. The benchmark DARPA KDDCup 1999 IDS dataset was used. SMOTE was implemented with two imbalance ratio, one is 1:4 another one is 1:1. After analysis the results of before and after resampling showed that FC performs better with imbalance ratio of 1:1. The accuracy of FC with balanced data was Normal traffic (99.19%), Denial of Service attacks (99.35%), Probe attacks (99.51%), Remote to Local attacks (99.67%) and User to Root attacks (99.41%). In addition, the data with imbalance ratio of 1:1 get the better results on all classes with these three machine learning algorithms.