MalSketch - A machine learning-based malware behaviour analysis framework

Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis m...

Full description

Saved in:
Bibliographic Details
Main Author: Chanderan, Navein
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://ir.unimas.my/id/eprint/27471/1/Navein%20Chanderan%20ft.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-unimas-ir.27471
record_format uketd_dc
spelling my-unimas-ir.274712023-04-25T08:32:06Z MalSketch - A machine learning-based malware behaviour analysis framework 2019-10-14 Chanderan, Navein QA75 Electronic computers. Computer science Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis methods of malware are just not enough to combat modern malware attack as it has inherent limitation in that it is easily defeated by obfuscation and polymorphism. On the other hand, dynamic analysis methods of malware behaviour do not suffer from such limitations due to the fact that the samples are executed, therefore revealing its true behaviours. To address this problem, a framework for the automatic analysis of malware behaviour is proposed. The framework analyses malware behaviour, then convert the behaviour reports into a metalanguage format suitable for machine learning. To speed up computation, Minhash is used to represent samples, and Locality Sensitive Hashing is applied for nearest neighbour search in sublinear time. Disjoint-set Forest clustering algorithm is then applied to the results to cluster malware into family clusters. The framework achieves 97.4% true positive rate and 99.4% true negative rate, using a dataset of 65,000 from VirusShare. This shows that the framework works very well even for random dataset, and it is capable of daily malware samples clustering and to identify unknown malware. Keywords: Malware behaviour, malware analysis, clustering, automated analysis Universiti Malaysia Sarawak (UNIMAS) 2019-10 Thesis http://ir.unimas.my/id/eprint/27471/ http://ir.unimas.my/id/eprint/27471/1/Navein%20Chanderan%20ft.pdf text en validuser masters Universiti Malaysia Sarawak (UNIMAS) Faculty of Computer Science and Information Technology
institution Universiti Malaysia Sarawak
collection UNIMAS Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Chanderan, Navein
MalSketch - A machine learning-based malware behaviour analysis framework
description Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis methods of malware are just not enough to combat modern malware attack as it has inherent limitation in that it is easily defeated by obfuscation and polymorphism. On the other hand, dynamic analysis methods of malware behaviour do not suffer from such limitations due to the fact that the samples are executed, therefore revealing its true behaviours. To address this problem, a framework for the automatic analysis of malware behaviour is proposed. The framework analyses malware behaviour, then convert the behaviour reports into a metalanguage format suitable for machine learning. To speed up computation, Minhash is used to represent samples, and Locality Sensitive Hashing is applied for nearest neighbour search in sublinear time. Disjoint-set Forest clustering algorithm is then applied to the results to cluster malware into family clusters. The framework achieves 97.4% true positive rate and 99.4% true negative rate, using a dataset of 65,000 from VirusShare. This shows that the framework works very well even for random dataset, and it is capable of daily malware samples clustering and to identify unknown malware. Keywords: Malware behaviour, malware analysis, clustering, automated analysis
format Thesis
qualification_level Master's degree
author Chanderan, Navein
author_facet Chanderan, Navein
author_sort Chanderan, Navein
title MalSketch - A machine learning-based malware behaviour analysis framework
title_short MalSketch - A machine learning-based malware behaviour analysis framework
title_full MalSketch - A machine learning-based malware behaviour analysis framework
title_fullStr MalSketch - A machine learning-based malware behaviour analysis framework
title_full_unstemmed MalSketch - A machine learning-based malware behaviour analysis framework
title_sort malsketch - a machine learning-based malware behaviour analysis framework
granting_institution Universiti Malaysia Sarawak (UNIMAS)
granting_department Faculty of Computer Science and Information Technology
publishDate 2019
url http://ir.unimas.my/id/eprint/27471/1/Navein%20Chanderan%20ft.pdf
_version_ 1783728337337712640