Privacy-Preserving Decision Tree Pruning In Network-Based Intrusion Detection System

Machine learning techniques have been extensively adopted in the domain of Network-based Intrusion Detection System (NIDS) especially in the task of network traffics classification. While having a precise classification model in separating the normal and malicious network traffics still remain as th...

Full description

Saved in:
Bibliographic Details
Main Author: Chew, Yee Jian
Format: Thesis
Published: 2019
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning techniques have been extensively adopted in the domain of Network-based Intrusion Detection System (NIDS) especially in the task of network traffics classification. While having a precise classification model in separating the normal and malicious network traffics still remain as the ultimate goal, the privacy protection for network traffic database cannot be ignore as well. The common solution to tackle this matter is anonymising the database through the statistical approach. Anonymising can be referred to masking, hiding or removing certain sensitive information from the database. In the past decades, numerous anonymisation tools and techniques have been developed to conceal the sensitive information which could be revealed by the network data. The main usage of privacy solutions is to conceal the potentially sensitive information in the network traces. However, it is also important to ensure the anonymisation techniques are not severely deteriorating the performances of NIDS. Presently, the conventional way to gauge the usability of network data is by exploiting the number of alarms generated by Snort NIDS before-and-after an anonymisation solution. Nevertheless, this approach may not be feasible when considering the application of machine learning in segregating the traffics. In order to fill this gap, 10 notable machine classifiers are employed to evaluate the performances of 2 network data privacy solutions: (1) port number bilateral classification and (2) IP truncation. Utility of the network data is measured based on the classification accuracy attained.