MalSketch - A machine learning-based malware behaviour analysis framework

Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis m...

Full description

Saved in:

Bibliographic Details
Main Author:	Chanderan, Navein
Format:	Thesis
Language:	English
Published:	2019
Subjects:	QA75 Electronic computers Computer science
Online Access:	http://ir.unimas.my/id/eprint/27471/1/Navein%20Chanderan%20ft.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-unimas-ir.27471
record_format	uketd_dc
spelling	my-unimas-ir.274712023-04-25T08:32:06Z MalSketch - A machine learning-based malware behaviour analysis framework 2019-10-14 Chanderan, Navein QA75 Electronic computers. Computer science Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis methods of malware are just not enough to combat modern malware attack as it has inherent limitation in that it is easily defeated by obfuscation and polymorphism. On the other hand, dynamic analysis methods of malware behaviour do not suffer from such limitations due to the fact that the samples are executed, therefore revealing its true behaviours. To address this problem, a framework for the automatic analysis of malware behaviour is proposed. The framework analyses malware behaviour, then convert the behaviour reports into a metalanguage format suitable for machine learning. To speed up computation, Minhash is used to represent samples, and Locality Sensitive Hashing is applied for nearest neighbour search in sublinear time. Disjoint-set Forest clustering algorithm is then applied to the results to cluster malware into family clusters. The framework achieves 97.4% true positive rate and 99.4% true negative rate, using a dataset of 65,000 from VirusShare. This shows that the framework works very well even for random dataset, and it is capable of daily malware samples clustering and to identify unknown malware. Keywords: Malware behaviour, malware analysis, clustering, automated analysis Universiti Malaysia Sarawak (UNIMAS) 2019-10 Thesis http://ir.unimas.my/id/eprint/27471/ http://ir.unimas.my/id/eprint/27471/1/Navein%20Chanderan%20ft.pdf text en validuser masters Universiti Malaysia Sarawak (UNIMAS) Faculty of Computer Science and Information Technology
institution	Universiti Malaysia Sarawak
collection	UNIMAS Institutional Repository
language	English
topic	QA75 Electronic computers Computer science
spellingShingle	QA75 Electronic computers Computer science Chanderan, Navein MalSketch - A machine learning-based malware behaviour analysis framework
description	Malware samples has increased exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples. Additionally, the diversity of malware types and the methods it employs to defeat analysis techniques has also increased steadily. Static analysis methods of malware are just not enough to combat modern malware attack as it has inherent limitation in that it is easily defeated by obfuscation and polymorphism. On the other hand, dynamic analysis methods of malware behaviour do not suffer from such limitations due to the fact that the samples are executed, therefore revealing its true behaviours. To address this problem, a framework for the automatic analysis of malware behaviour is proposed. The framework analyses malware behaviour, then convert the behaviour reports into a metalanguage format suitable for machine learning. To speed up computation, Minhash is used to represent samples, and Locality Sensitive Hashing is applied for nearest neighbour search in sublinear time. Disjoint-set Forest clustering algorithm is then applied to the results to cluster malware into family clusters. The framework achieves 97.4% true positive rate and 99.4% true negative rate, using a dataset of 65,000 from VirusShare. This shows that the framework works very well even for random dataset, and it is capable of daily malware samples clustering and to identify unknown malware. Keywords: Malware behaviour, malware analysis, clustering, automated analysis
format	Thesis
qualification_level	Master's degree
author	Chanderan, Navein
author_facet	Chanderan, Navein
author_sort	Chanderan, Navein
title	MalSketch - A machine learning-based malware behaviour analysis framework
title_short	MalSketch - A machine learning-based malware behaviour analysis framework
title_full	MalSketch - A machine learning-based malware behaviour analysis framework
title_fullStr	MalSketch - A machine learning-based malware behaviour analysis framework
title_full_unstemmed	MalSketch - A machine learning-based malware behaviour analysis framework
title_sort	malsketch - a machine learning-based malware behaviour analysis framework
granting_institution	Universiti Malaysia Sarawak (UNIMAS)
granting_department	Faculty of Computer Science and Information Technology
publishDate	2019
url	http://ir.unimas.my/id/eprint/27471/1/Navein%20Chanderan%20ft.pdf
_version_	1783728337337712640

MalSketch - A machine learning-based malware behaviour analysis framework

Similar Items