Evaluation of machine learning techniques for imbalanced data in IDS

Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give h...

Full description

Saved in:
Bibliographic Details
Main Author: Mokaramian, Shahram
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/37080/5/ShahramMokaramianMFSKSM2013.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.37080
record_format uketd_dc
spelling my-utm-ep.370802017-06-29T07:03:41Z Evaluation of machine learning techniques for imbalanced data in IDS 2013-08 Mokaramian, Shahram TK7885-7895 Computer engineer. Computer hardware Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give high overall accuracy. However, they favor on majority class when dealing with imbalanced data. Unfortunately, IDS deals with highly imbalanced data distribution and most machine learning algorithms have poor detection on R2L and U2R classes, which include malicious attacks. Therefore, it requires a resampling technique to balance the data. The purpose of this study is to investigate performance of three machine learning algorithms which are Support Vector Machine (SVM), Decision Tree (DT) and Fuzzy Classifier (FC) for imbalanced data in IDS and after the rebalanced the data which was achieved using Synthetic Minority Over-sampling TEchnique (SOMTE). The performance of the three machine learning algorithms was evaluated with the new rebalanced data. The benchmark DARPA KDDCup 1999 IDS dataset was used. SMOTE was implemented with two imbalance ratio, one is 1:4 another one is 1:1. After analysis the results of before and after resampling showed that FC performs better with imbalance ratio of 1:1. The accuracy of FC with balanced data was Normal traffic (99.19%), Denial of Service attacks (99.35%), Probe attacks (99.51%), Remote to Local attacks (99.67%) and User to Root attacks (99.41%). In addition, the data with imbalance ratio of 1:1 get the better results on all classes with these three machine learning algorithms. 2013-08 Thesis http://eprints.utm.my/id/eprint/37080/ http://eprints.utm.my/id/eprint/37080/5/ShahramMokaramianMFSKSM2013.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:70060?site_name=Restricted Repository masters Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic TK7885-7895 Computer engineer
Computer hardware
spellingShingle TK7885-7895 Computer engineer
Computer hardware
Mokaramian, Shahram
Evaluation of machine learning techniques for imbalanced data in IDS
description Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give high overall accuracy. However, they favor on majority class when dealing with imbalanced data. Unfortunately, IDS deals with highly imbalanced data distribution and most machine learning algorithms have poor detection on R2L and U2R classes, which include malicious attacks. Therefore, it requires a resampling technique to balance the data. The purpose of this study is to investigate performance of three machine learning algorithms which are Support Vector Machine (SVM), Decision Tree (DT) and Fuzzy Classifier (FC) for imbalanced data in IDS and after the rebalanced the data which was achieved using Synthetic Minority Over-sampling TEchnique (SOMTE). The performance of the three machine learning algorithms was evaluated with the new rebalanced data. The benchmark DARPA KDDCup 1999 IDS dataset was used. SMOTE was implemented with two imbalance ratio, one is 1:4 another one is 1:1. After analysis the results of before and after resampling showed that FC performs better with imbalance ratio of 1:1. The accuracy of FC with balanced data was Normal traffic (99.19%), Denial of Service attacks (99.35%), Probe attacks (99.51%), Remote to Local attacks (99.67%) and User to Root attacks (99.41%). In addition, the data with imbalance ratio of 1:1 get the better results on all classes with these three machine learning algorithms.
format Thesis
qualification_level Master's degree
author Mokaramian, Shahram
author_facet Mokaramian, Shahram
author_sort Mokaramian, Shahram
title Evaluation of machine learning techniques for imbalanced data in IDS
title_short Evaluation of machine learning techniques for imbalanced data in IDS
title_full Evaluation of machine learning techniques for imbalanced data in IDS
title_fullStr Evaluation of machine learning techniques for imbalanced data in IDS
title_full_unstemmed Evaluation of machine learning techniques for imbalanced data in IDS
title_sort evaluation of machine learning techniques for imbalanced data in ids
granting_institution Universiti Teknologi Malaysia, Faculty of Computing
granting_department Faculty of Computing
publishDate 2013
url http://eprints.utm.my/id/eprint/37080/5/ShahramMokaramianMFSKSM2013.pdf
_version_ 1747816498869567488