Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine

With the technological revolution in the 21st century, time and distance of communication are decreased by using electronic mail (e-mail). Furthermore, the growing use of e-mail has led to the emergence and further growth problems caused by unsolicited bulk e-mails, commonly referred to as spam e-ma...

Full description

Saved in:
Bibliographic Details
Main Author: Elssied Hamed, Nadir Omer Fadl
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/77765/1/NadirOmerFadlPFC2015.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.77765
record_format uketd_dc
spelling my-utm-ep.777652018-07-04T11:44:16Z Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine 2015-02 Elssied Hamed, Nadir Omer Fadl QA75 Electronic computers. Computer science With the technological revolution in the 21st century, time and distance of communication are decreased by using electronic mail (e-mail). Furthermore, the growing use of e-mail has led to the emergence and further growth problems caused by unsolicited bulk e-mails, commonly referred to as spam e-mail. Many of the existing supervised algorithms like the Support Vector Machine (SVM) were developed to stop the spam e-mail. However, the problem of dealing with large data and high dimensionality of feature space can lead to high execution-time and low accuracy of spam e-mail classification. Nowadays, removing the irrelevant and redundant features beside finding the optimal (or near-optimal) subset of features significantly influences the performance of spam e-mail classification; this has become one of the important challenges. Therefore, in order to optimize spam e-mail classification accuracy, dimensional reduction issues need to be solved. Feature selection schemes become very important in order to reduce the dimensionality through selecting a proper subset feature to facilitate the classification process. The objective of this study is to investigate and improve schemes to reduce the execution time and increase the accuracy of spam e-mail classification. The methodology of this study comprises of four schemes: one-way ANOVA f-test, Binary Differential Evolution (BDE), Opposition Differential Evolution (ODE) and Opposition Particle Swarm Optimization (OPSO), and combination of Differential Evolution (DE) and Particle Swarm Optimization (PSO). The four schemes were used to improve the spam e-mail classification accuracy. The classification accuracy of the proposed schemes were 95.05% with population size of 50 and 1000 number of iterations in 20 runs and 41 features. The experiment of the proposed schemes were carried out using spambase and spamassassin benchmark dataset to evaluate the feasibility of proposed schemes. The experimental findings demonstrate that the improved schemes were able to efficiently reduce the number of features as well as improving the e-mail classification accuracy. 2015-02 Thesis http://eprints.utm.my/id/eprint/77765/ http://eprints.utm.my/id/eprint/77765/1/NadirOmerFadlPFC2015.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96673 phd doctoral Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Elssied Hamed, Nadir Omer Fadl
Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine
description With the technological revolution in the 21st century, time and distance of communication are decreased by using electronic mail (e-mail). Furthermore, the growing use of e-mail has led to the emergence and further growth problems caused by unsolicited bulk e-mails, commonly referred to as spam e-mail. Many of the existing supervised algorithms like the Support Vector Machine (SVM) were developed to stop the spam e-mail. However, the problem of dealing with large data and high dimensionality of feature space can lead to high execution-time and low accuracy of spam e-mail classification. Nowadays, removing the irrelevant and redundant features beside finding the optimal (or near-optimal) subset of features significantly influences the performance of spam e-mail classification; this has become one of the important challenges. Therefore, in order to optimize spam e-mail classification accuracy, dimensional reduction issues need to be solved. Feature selection schemes become very important in order to reduce the dimensionality through selecting a proper subset feature to facilitate the classification process. The objective of this study is to investigate and improve schemes to reduce the execution time and increase the accuracy of spam e-mail classification. The methodology of this study comprises of four schemes: one-way ANOVA f-test, Binary Differential Evolution (BDE), Opposition Differential Evolution (ODE) and Opposition Particle Swarm Optimization (OPSO), and combination of Differential Evolution (DE) and Particle Swarm Optimization (PSO). The four schemes were used to improve the spam e-mail classification accuracy. The classification accuracy of the proposed schemes were 95.05% with population size of 50 and 1000 number of iterations in 20 runs and 41 features. The experiment of the proposed schemes were carried out using spambase and spamassassin benchmark dataset to evaluate the feasibility of proposed schemes. The experimental findings demonstrate that the improved schemes were able to efficiently reduce the number of features as well as improving the e-mail classification accuracy.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Elssied Hamed, Nadir Omer Fadl
author_facet Elssied Hamed, Nadir Omer Fadl
author_sort Elssied Hamed, Nadir Omer Fadl
title Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine
title_short Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine
title_full Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine
title_fullStr Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine
title_full_unstemmed Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine
title_sort improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine
granting_institution Universiti Teknologi Malaysia, Faculty of Computing
granting_department Faculty of Computing
publishDate 2015
url http://eprints.utm.my/id/eprint/77765/1/NadirOmerFadlPFC2015.pdf
_version_ 1747817825487028224