A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem

In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Pozi, Muhammad Syafiq
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/69313/1/FSKTM%202016%204%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.69313
record_format uketd_dc
spelling my-upm-ir.693132019-06-28T08:14:21Z A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem 2016-02 Mohd Pozi, Muhammad Syafiq In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased classification model is highly anticipated as higher accuracy can always be represented by majority class. There are two methods in dealing with imbalanced classification problem, which are based on data or algorithmic level. Data level based methods are meant to solve the imbalanced classification problem based on the idea of making both classes equal in number. However, by changing the distribution of both classes, the original classes distribution that are followed by that particular data will be violated. Algorithmic level based methods however are based on introducing new optimization task to improve the minority class classification rate, without changing the data characteristics. Nevertheless, the optimization task requires specific care in order to prevent the issue of overfitting classification model. Therefore, a new classifier based on genetic programming (GP) and support vector machine (SVM) is proposed in this thesis in order to solve the imbalanced classification problem without changing the data properties. The idea is to use GP to optimize the SVM decision function such that the minority class classification rate is increased without sacrificing the accuracy rate for both classes. In addition, the classifier is also optimized such that it has a good generalization property. The main keys of the new classifier are based on the new kernel method, new learning metric and a new optimization algorithm in order to optimize the SVM decision function. The proposed classifier is called Support Vector Genetic Programming Machine, SVGPM. In order to evaluate the performance of SVGPM against current methods in solving imbalanced classification task, three experiments are conducted such as on selected standard class imbalanced benchmark data sets, intrusion detection system (IDS) data set and remote sensing data set. The SVGPM performance is compared against SVM and cost-sensitive SVM due to the superiority of SVM in dealing with imbalanced classification problem. The second experiment is by evaluating the SVGPM performance on detecting anomalous rare attacks from network intrusion data set. The SVGPM performance is compared against current methods in developing a prediction model for IDS. In the third experiment, SVGPM is evaluated on wilt disease data set from remote sensing study, to identify wilt diseased trees in high-resolution image. The SVGPM performance is compared against the previously proposed methods in mapping the regions that are covered by wilt diseased trees in Japan. The carried out experimentation shown that SVGPM gives a very good classification rate in classifying minority class without sacrificing the accuracy rate for both classes. This is because, in the training stage, the introduced optimization task in SVGPM ensures that each minority class example is generalized into one learning concept and both classification rate for majority and minority classes are similar. Genetic programming (Computer science) Support vector machines 2016-02 Thesis http://psasir.upm.edu.my/id/eprint/69313/ http://psasir.upm.edu.my/id/eprint/69313/1/FSKTM%202016%204%20IR.pdf text en public doctoral Universiti Putra Malaysia Genetic programming (Computer science) Support vector machines
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
topic Genetic programming (Computer science)
Support vector machines

spellingShingle Genetic programming (Computer science)
Support vector machines

Mohd Pozi, Muhammad Syafiq
A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
description In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased classification model is highly anticipated as higher accuracy can always be represented by majority class. There are two methods in dealing with imbalanced classification problem, which are based on data or algorithmic level. Data level based methods are meant to solve the imbalanced classification problem based on the idea of making both classes equal in number. However, by changing the distribution of both classes, the original classes distribution that are followed by that particular data will be violated. Algorithmic level based methods however are based on introducing new optimization task to improve the minority class classification rate, without changing the data characteristics. Nevertheless, the optimization task requires specific care in order to prevent the issue of overfitting classification model. Therefore, a new classifier based on genetic programming (GP) and support vector machine (SVM) is proposed in this thesis in order to solve the imbalanced classification problem without changing the data properties. The idea is to use GP to optimize the SVM decision function such that the minority class classification rate is increased without sacrificing the accuracy rate for both classes. In addition, the classifier is also optimized such that it has a good generalization property. The main keys of the new classifier are based on the new kernel method, new learning metric and a new optimization algorithm in order to optimize the SVM decision function. The proposed classifier is called Support Vector Genetic Programming Machine, SVGPM. In order to evaluate the performance of SVGPM against current methods in solving imbalanced classification task, three experiments are conducted such as on selected standard class imbalanced benchmark data sets, intrusion detection system (IDS) data set and remote sensing data set. The SVGPM performance is compared against SVM and cost-sensitive SVM due to the superiority of SVM in dealing with imbalanced classification problem. The second experiment is by evaluating the SVGPM performance on detecting anomalous rare attacks from network intrusion data set. The SVGPM performance is compared against current methods in developing a prediction model for IDS. In the third experiment, SVGPM is evaluated on wilt disease data set from remote sensing study, to identify wilt diseased trees in high-resolution image. The SVGPM performance is compared against the previously proposed methods in mapping the regions that are covered by wilt diseased trees in Japan. The carried out experimentation shown that SVGPM gives a very good classification rate in classifying minority class without sacrificing the accuracy rate for both classes. This is because, in the training stage, the introduced optimization task in SVGPM ensures that each minority class example is generalized into one learning concept and both classification rate for majority and minority classes are similar.
format Thesis
qualification_level Doctorate
author Mohd Pozi, Muhammad Syafiq
author_facet Mohd Pozi, Muhammad Syafiq
author_sort Mohd Pozi, Muhammad Syafiq
title A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_short A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_full A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_fullStr A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_full_unstemmed A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
title_sort new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem
granting_institution Universiti Putra Malaysia
publishDate 2016
url http://psasir.upm.edu.my/id/eprint/69313/1/FSKTM%202016%204%20IR.pdf
_version_ 1747812683511496704