A direct ensemble classifier for learning imbalanced multiclass data

A traditional direct single classifier can be easily applied to solve a multiclass classification problem. However, the performance of a single classifier is decreased with the existence of imbalanced data in multiclass classification tasks. Thus, an ensemble of classifiers is one of the methods use...

Full description

Saved in:
Bibliographic Details
Main Author: Samry @ Mohd Shamrie Sainin
Format: Thesis
Language:English
English
Published: 2013
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/38557/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/38557/2/FULLTEXT.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ums-ep.38557
record_format uketd_dc
spelling my-ums-ep.385572024-04-29T02:37:16Z A direct ensemble classifier for learning imbalanced multiclass data 2013 Samry @ Mohd Shamrie Sainin TK7885-7895 Computer engineering. Computer hardware A traditional direct single classifier can be easily applied to solve a multiclass classification problem. However, the performance of a single classifier is decreased with the existence of imbalanced data in multiclass classification tasks. Thus, an ensemble of classifiers is one of the methods used to solve multiclass classification tasks. In this thesis, the problem of learning from imbalanced multiclass data classification is studied. In the multiclass classification problem, decision can be estimated not only by the final single class label, but also by other appropriate class. Many real-world multiclass classification problems can be represented into a setting where non-crisp label need to be observed. An in-depth review and method to solve this special learning task is explained in this thesis. An alternative ensemble learning framework called Direct Ensemble Classifier for Imbalance Learning (DECIML) is proposed combining the advantages of existing single classifiers and ensemble methods and strategies. The learning framework consists of ensemble learning and decision combiner model with general supervised learning algorithms as base learner. Feature selection is also applied in DECIML in order to increase the performance of the ensemble learning. In order to facilitate the experiments and future research on the imbalanced multiclass problem, a standard pool of benchmark data is created, which consists of 16 datasets with different degrees of imbalanced ratio and 4 datasets for imbalanced multiclass with feature selection purposes. The benchmark data is used to evaluate and compare the proposed frameworks with several ensemble methods, such as bagging and adaboost. The DECIML with feature selection is also evaluated and compared with methods named CFsSubsetEval and Filteredsubseteval. The results obtained show that the proposed learning frameworks are comparable to other methods. In addition, the selected benchmark data, experiments and the results are useful for future research on the imbalanced multiclass classification problem. Furthermore, the DECIML framework was applied to the real world leaf classification problem based on the shape features. Extensive experiments and results show that the DECIML method does provide a promising performance in imbalanced multiclass with highly noisy data. 2013 Thesis https://eprints.ums.edu.my/id/eprint/38557/ https://eprints.ums.edu.my/id/eprint/38557/1/24%20PAGES.pdf text en public https://eprints.ums.edu.my/id/eprint/38557/2/FULLTEXT.pdf text en validuser dphil doctoral Universiti Malaysia Sabah Sekolah Kejuruteraan dan Teknologi Maklumat
institution Universiti Malaysia Sabah
collection UMS Institutional Repository
language English
English
topic TK7885-7895 Computer engineering
Computer hardware
spellingShingle TK7885-7895 Computer engineering
Computer hardware
Samry @ Mohd Shamrie Sainin
A direct ensemble classifier for learning imbalanced multiclass data
description A traditional direct single classifier can be easily applied to solve a multiclass classification problem. However, the performance of a single classifier is decreased with the existence of imbalanced data in multiclass classification tasks. Thus, an ensemble of classifiers is one of the methods used to solve multiclass classification tasks. In this thesis, the problem of learning from imbalanced multiclass data classification is studied. In the multiclass classification problem, decision can be estimated not only by the final single class label, but also by other appropriate class. Many real-world multiclass classification problems can be represented into a setting where non-crisp label need to be observed. An in-depth review and method to solve this special learning task is explained in this thesis. An alternative ensemble learning framework called Direct Ensemble Classifier for Imbalance Learning (DECIML) is proposed combining the advantages of existing single classifiers and ensemble methods and strategies. The learning framework consists of ensemble learning and decision combiner model with general supervised learning algorithms as base learner. Feature selection is also applied in DECIML in order to increase the performance of the ensemble learning. In order to facilitate the experiments and future research on the imbalanced multiclass problem, a standard pool of benchmark data is created, which consists of 16 datasets with different degrees of imbalanced ratio and 4 datasets for imbalanced multiclass with feature selection purposes. The benchmark data is used to evaluate and compare the proposed frameworks with several ensemble methods, such as bagging and adaboost. The DECIML with feature selection is also evaluated and compared with methods named CFsSubsetEval and Filteredsubseteval. The results obtained show that the proposed learning frameworks are comparable to other methods. In addition, the selected benchmark data, experiments and the results are useful for future research on the imbalanced multiclass classification problem. Furthermore, the DECIML framework was applied to the real world leaf classification problem based on the shape features. Extensive experiments and results show that the DECIML method does provide a promising performance in imbalanced multiclass with highly noisy data.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Samry @ Mohd Shamrie Sainin
author_facet Samry @ Mohd Shamrie Sainin
author_sort Samry @ Mohd Shamrie Sainin
title A direct ensemble classifier for learning imbalanced multiclass data
title_short A direct ensemble classifier for learning imbalanced multiclass data
title_full A direct ensemble classifier for learning imbalanced multiclass data
title_fullStr A direct ensemble classifier for learning imbalanced multiclass data
title_full_unstemmed A direct ensemble classifier for learning imbalanced multiclass data
title_sort direct ensemble classifier for learning imbalanced multiclass data
granting_institution Universiti Malaysia Sabah
granting_department Sekolah Kejuruteraan dan Teknologi Maklumat
publishDate 2013
url https://eprints.ums.edu.my/id/eprint/38557/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/38557/2/FULLTEXT.pdf
_version_ 1804890327782260736