MalSketch - A machine learning-based malware behaviour analysis framework

Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis m...

Full description

Saved in:
Bibliographic Details
Main Author: Chanderan, Navein
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://ir.unimas.my/id/eprint/27471/1/Navein%20Chanderan%20ft.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis methods of malware are just not enough to combat modern malware attack as it has inherent limitation in that it is easily defeated by obfuscation and polymorphism. On the other hand, dynamic analysis methods of malware behaviour do not suffer from such limitations due to the fact that the samples are executed, therefore revealing its true behaviours. To address this problem, a framework for the automatic analysis of malware behaviour is proposed. The framework analyses malware behaviour, then convert the behaviour reports into a metalanguage format suitable for machine learning. To speed up computation, Minhash is used to represent samples, and Locality Sensitive Hashing is applied for nearest neighbour search in sublinear time. Disjoint-set Forest clustering algorithm is then applied to the results to cluster malware into family clusters. The framework achieves 97.4% true positive rate and 99.4% true negative rate, using a dataset of 65,000 from VirusShare. This shows that the framework works very well even for random dataset, and it is capable of daily malware samples clustering and to identify unknown malware. Keywords: Malware behaviour, malware analysis, clustering, automated analysis