Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets

The imbalanced data problem is common in data mining nowadays due to the skewed nature of data, which impact the classification process negatively in machine learning. For preprocessing, oversampling techniques significantly benefitted the imbalanced domain, in which artificial data is generated in...

Full description

Saved in:
Bibliographic Details
Main Author: Ali, Haseeb
Format: Thesis
Language:English
English
English
Published: 2019
Subjects:
Online Access:http://eprints.uthm.edu.my/504/1/24p%20HASEEB%20ALI.pdf
http://eprints.uthm.edu.my/504/2/HASEEB%20ALI%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/504/3/HASEEB%20ALI%20WATERMARK.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uthm-ep.504
record_format uketd_dc
spelling my-uthm-ep.5042021-07-25T07:55:34Z Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets 2019-12 Ali, Haseeb QA75-76.95 Calculating machines The imbalanced data problem is common in data mining nowadays due to the skewed nature of data, which impact the classification process negatively in machine learning. For preprocessing, oversampling techniques significantly benefitted the imbalanced domain, in which artificial data is generated in minority class to enhance the number of samples and balance the distribution of samples in both classes. However, existing oversampling techniques encounter through overfitting and over-generalization problems which lessen the classifier performance. Although many clustering based oversampling techniques significantly overcome these problems but most of these techniques are not able to produce the appropriate number of synthetic samples in minority clusters. This study proposed an improved Adaptive Semi-unsupervised Weighted Oversampling (IA-SUWO) technique, using the sparsity factor which determine the sparse minority samples in each minority cluster. This technique consider the sparse minority samples which are far from the decision boundary. These samples also carry the important information for learning of minority class, if these samples are also considered for oversampling, imbalance ratio will be more reduce also it could enhance the learnability of the classifiers. The outcomes of the proposed approach have been compared with existing oversampling techniques such as SMOTE, Borderline-SMOTE, Safe-level SMOTE, and standard A-SUWO technique in terms of accuracy. As aforementioned, the comparative analysis revealed that the proposed oversampling approach performance increased in average by 5% from 85% to 90% than the existing comparative techniques. 2019-12 Thesis http://eprints.uthm.edu.my/504/ http://eprints.uthm.edu.my/504/1/24p%20HASEEB%20ALI.pdf text en public http://eprints.uthm.edu.my/504/2/HASEEB%20ALI%20COPYRIGHT%20DECLARATION.pdf text en staffonly http://eprints.uthm.edu.my/504/3/HASEEB%20ALI%20WATERMARK.pdf text en validuser mphil masters Universiti Tun Hussein Onn Malaysia Fakulti Sains Komputer dan Teknologi Maklumat
institution Universiti Tun Hussein Onn Malaysia
collection UTHM Institutional Repository
language English
English
English
topic QA75-76.95 Calculating machines
spellingShingle QA75-76.95 Calculating machines
Ali, Haseeb
Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets
description The imbalanced data problem is common in data mining nowadays due to the skewed nature of data, which impact the classification process negatively in machine learning. For preprocessing, oversampling techniques significantly benefitted the imbalanced domain, in which artificial data is generated in minority class to enhance the number of samples and balance the distribution of samples in both classes. However, existing oversampling techniques encounter through overfitting and over-generalization problems which lessen the classifier performance. Although many clustering based oversampling techniques significantly overcome these problems but most of these techniques are not able to produce the appropriate number of synthetic samples in minority clusters. This study proposed an improved Adaptive Semi-unsupervised Weighted Oversampling (IA-SUWO) technique, using the sparsity factor which determine the sparse minority samples in each minority cluster. This technique consider the sparse minority samples which are far from the decision boundary. These samples also carry the important information for learning of minority class, if these samples are also considered for oversampling, imbalance ratio will be more reduce also it could enhance the learnability of the classifiers. The outcomes of the proposed approach have been compared with existing oversampling techniques such as SMOTE, Borderline-SMOTE, Safe-level SMOTE, and standard A-SUWO technique in terms of accuracy. As aforementioned, the comparative analysis revealed that the proposed oversampling approach performance increased in average by 5% from 85% to 90% than the existing comparative techniques.
format Thesis
qualification_name Master of Philosophy (M.Phil.)
qualification_level Master's degree
author Ali, Haseeb
author_facet Ali, Haseeb
author_sort Ali, Haseeb
title Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets
title_short Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets
title_full Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets
title_fullStr Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets
title_full_unstemmed Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets
title_sort improved adaptive semi-unsupervised weighted oversampling (ia-suwo) using sparsity factor for imbalanced datasets
granting_institution Universiti Tun Hussein Onn Malaysia
granting_department Fakulti Sains Komputer dan Teknologi Maklumat
publishDate 2019
url http://eprints.uthm.edu.my/504/1/24p%20HASEEB%20ALI.pdf
http://eprints.uthm.edu.my/504/2/HASEEB%20ALI%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/504/3/HASEEB%20ALI%20WATERMARK.pdf
_version_ 1747830626320384000