Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection

The aim of this thesis is to enrich the ongoing efforts of protecting Internet users against phishing attacks. Mainstream solutions and technical approaches for phishing detection suffer from inherent problems such as ineffectiveness against newly launched phishing webpages, misclassification of leg...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Colin Choon Lin
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://ir.unimas.my/id/eprint/37167/1/Colin.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-unimas-ir.37167
record_format uketd_dc
spelling my-unimas-ir.371672023-08-17T07:44:30Z Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection 2021-08-12 Tan, Colin Choon Lin QA75 Electronic computers. Computer science The aim of this thesis is to enrich the ongoing efforts of protecting Internet users against phishing attacks. Mainstream solutions and technical approaches for phishing detection suffer from inherent problems such as ineffectiveness against newly launched phishing webpages, misclassification of legitimate webpages, utilisation of irrelevant features, and susceptibility to intentional manipulation by adversaries. In this study, we explore whether ensemble strategies can be leveraged in website identity verification and feature optimisation to address the limitations of existing techniques. This study intends to provide a deeper understanding on the progressive state of phishing and identify potential directions where phishing detection measures should be concentrated. Through the proposal of an improved website logo extraction technique, we showed that the ensemble of visual and textual identities has led to a promising detection accuracy of 98.6%. The misclassification rate of legitimate webpages has also improved by 3.4%, which is consistent with our aim of attaining robustness over legitimate webpages with varying properties that users routinely encounter. To facilitate the identification of essential features for phishing detection, we propose a novel ensemble feature selection framework, which achieved a competitive detection accuracy of 94.6% using only 20.8% of the original number of features. Based on experimental results, we also challenged the utilisation of certain conventional features that are often highly rated and falsely assumed to be effective. Lastly, we showed that the underlying phishing patterns at the webpage interconnection level can be exploited using ensemble strategies in a graph-theoretic approach, achieving up to 97.8% of accuracy while demonstrating robustness and immutability against current and emerging phishing schemes. Universiti Malaysia Sarawak (UNIMAS) 2021-08 Thesis http://ir.unimas.my/id/eprint/37167/ http://ir.unimas.my/id/eprint/37167/1/Colin.pdf text en validuser phd doctoral Universiti Malaysia Sarawak (UNIMAS) Faculty of Computer Science and Information Technology Yayasan Sarawak Tun Taib Scholarship
institution Universiti Malaysia Sarawak
collection UNIMAS Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Tan, Colin Choon Lin
Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection
description The aim of this thesis is to enrich the ongoing efforts of protecting Internet users against phishing attacks. Mainstream solutions and technical approaches for phishing detection suffer from inherent problems such as ineffectiveness against newly launched phishing webpages, misclassification of legitimate webpages, utilisation of irrelevant features, and susceptibility to intentional manipulation by adversaries. In this study, we explore whether ensemble strategies can be leveraged in website identity verification and feature optimisation to address the limitations of existing techniques. This study intends to provide a deeper understanding on the progressive state of phishing and identify potential directions where phishing detection measures should be concentrated. Through the proposal of an improved website logo extraction technique, we showed that the ensemble of visual and textual identities has led to a promising detection accuracy of 98.6%. The misclassification rate of legitimate webpages has also improved by 3.4%, which is consistent with our aim of attaining robustness over legitimate webpages with varying properties that users routinely encounter. To facilitate the identification of essential features for phishing detection, we propose a novel ensemble feature selection framework, which achieved a competitive detection accuracy of 94.6% using only 20.8% of the original number of features. Based on experimental results, we also challenged the utilisation of certain conventional features that are often highly rated and falsely assumed to be effective. Lastly, we showed that the underlying phishing patterns at the webpage interconnection level can be exploited using ensemble strategies in a graph-theoretic approach, achieving up to 97.8% of accuracy while demonstrating robustness and immutability against current and emerging phishing schemes.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Tan, Colin Choon Lin
author_facet Tan, Colin Choon Lin
author_sort Tan, Colin Choon Lin
title Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection
title_short Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection
title_full Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection
title_fullStr Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection
title_full_unstemmed Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection
title_sort leveraging ensemble strategies in identity verification and feature optimisation for phishing website detection
granting_institution Universiti Malaysia Sarawak (UNIMAS)
granting_department Faculty of Computer Science and Information Technology
publishDate 2021
url http://ir.unimas.my/id/eprint/37167/1/Colin.pdf
_version_ 1783728483264888832