Hybrid performance measures and mixed evaluation method for data classification problems

This study investigates two different issues of performance measure in data classification problem. First, this study examines the use of accuracy measure as a discriminator for building an optimized Prototype Selection (PS) algorithm. Second, this study evaluates the current evaluation practices fo...

Full description

Saved in:
Bibliographic Details
Main Author: Hossin, Mohammad
Format: Thesis
Language:English
Published: 2012
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/33140/1/FSKTM%202012%2022.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.33140
record_format uketd_dc
spelling my-upm-ir.331402024-09-04T03:11:18Z Hybrid performance measures and mixed evaluation method for data classification problems 2012-04 Hossin, Mohammad This study investigates two different issues of performance measure in data classification problem. First, this study examines the use of accuracy measure as a discriminator for building an optimized Prototype Selection (PS) algorithm. Second, this study evaluates the current evaluation practices for evaluating and comparing the two performance measures. From the literature, the use of accuracy could lead to the underperforming of the evaluation process due to less distinctive and less discriminable values, and also unable to perform optimally when confronted with imbalanced class problem. Interestingly, the accuracy measure is still widely used in evaluating data classification problem. On the evaluation analysis, many previous studies emphasize on the generalization ability in evaluating and comparing the performance measures. Only few efforts have been dedicated to evaluate and compare the performance measures using different performance characteristics. In fact, no previous studies employ mixed evaluation method in evaluating and comparing the performance measures. For tackling the first issue, this study has successfully proposed several hybrid measures through the combination of accuracy with precision and recall measures. These hybrid measures are known as Optimized Accuracy with Conventional Recall-Precision (OACRP) and Optimized Accuracy with Extended Recall-Precision version 1 and version 2 (OAERP1 and OAERP2). More importantly, the OAERP1 and OAERP2 measure have been extended for evaluating multi-class problem. For the second issue, this study has proposed mixed evaluation method to evaluate the performance of two performance measures through different performance characteristics. For a systematic analysis, the mixed evaluation method is implemented into two stages. First, the hybrid measures are compared and analyzed against the accuracy measure based on their produced-values through different classification problems with different class distribution problems. Second, the hybrid measures are compared and analyzed empirically against the accuracy measure and other selected performance measures based on generalization ability using three selected PS algorithms (MCS, LVQ21 and GA) and large benchmark datasets. In the first evaluation stage, the OAERP2 measure has shown better produced-value against accuracy, OACRP and OAERP1 measures in terms of distinctiveness,discriminability, informativeness, favors towards minority class, and degree of consistency and discriminatory. In the second evaluation stage, almost all selected algorithms that optimized by OAERP2 measure are able to produce better generalization ability against its original measure and other selected performance measures. Moreover, the GA model that was optimized by OAERP2 measure (GAoe2) performed significantly and statistically differently as compared to other OAERP2-based models through win-draw-loss evaluation method and two nonparametric tests. Interestingly, the GAoe2 model also performed significantly and statistically differently as compared to nine additional PS algorithms in terms of testing error and storage requirements. From all evaluations, it clearly reveals that the OAERP2 measure is able to choose a better solution during the classification training. As a result, it leads towards a better trained PS classifier with better generalization ability. On the other hand, the mixed evaluation method has enabled this study to evaluate and compare the studied performance measures systematically and comprehensively via different performance characteristics. Computer algorithms Machine learning 2012-04 Thesis http://psasir.upm.edu.my/id/eprint/33140/ http://psasir.upm.edu.my/id/eprint/33140/1/FSKTM%202012%2022.pdf text en public doctoral Universiti Putra Malaysia Computer algorithms Machine learning Sulaiman, Md. Nasir
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Sulaiman, Md. Nasir
topic Computer algorithms
Machine learning

spellingShingle Computer algorithms
Machine learning

Hossin, Mohammad
Hybrid performance measures and mixed evaluation method for data classification problems
description This study investigates two different issues of performance measure in data classification problem. First, this study examines the use of accuracy measure as a discriminator for building an optimized Prototype Selection (PS) algorithm. Second, this study evaluates the current evaluation practices for evaluating and comparing the two performance measures. From the literature, the use of accuracy could lead to the underperforming of the evaluation process due to less distinctive and less discriminable values, and also unable to perform optimally when confronted with imbalanced class problem. Interestingly, the accuracy measure is still widely used in evaluating data classification problem. On the evaluation analysis, many previous studies emphasize on the generalization ability in evaluating and comparing the performance measures. Only few efforts have been dedicated to evaluate and compare the performance measures using different performance characteristics. In fact, no previous studies employ mixed evaluation method in evaluating and comparing the performance measures. For tackling the first issue, this study has successfully proposed several hybrid measures through the combination of accuracy with precision and recall measures. These hybrid measures are known as Optimized Accuracy with Conventional Recall-Precision (OACRP) and Optimized Accuracy with Extended Recall-Precision version 1 and version 2 (OAERP1 and OAERP2). More importantly, the OAERP1 and OAERP2 measure have been extended for evaluating multi-class problem. For the second issue, this study has proposed mixed evaluation method to evaluate the performance of two performance measures through different performance characteristics. For a systematic analysis, the mixed evaluation method is implemented into two stages. First, the hybrid measures are compared and analyzed against the accuracy measure based on their produced-values through different classification problems with different class distribution problems. Second, the hybrid measures are compared and analyzed empirically against the accuracy measure and other selected performance measures based on generalization ability using three selected PS algorithms (MCS, LVQ21 and GA) and large benchmark datasets. In the first evaluation stage, the OAERP2 measure has shown better produced-value against accuracy, OACRP and OAERP1 measures in terms of distinctiveness,discriminability, informativeness, favors towards minority class, and degree of consistency and discriminatory. In the second evaluation stage, almost all selected algorithms that optimized by OAERP2 measure are able to produce better generalization ability against its original measure and other selected performance measures. Moreover, the GA model that was optimized by OAERP2 measure (GAoe2) performed significantly and statistically differently as compared to other OAERP2-based models through win-draw-loss evaluation method and two nonparametric tests. Interestingly, the GAoe2 model also performed significantly and statistically differently as compared to nine additional PS algorithms in terms of testing error and storage requirements. From all evaluations, it clearly reveals that the OAERP2 measure is able to choose a better solution during the classification training. As a result, it leads towards a better trained PS classifier with better generalization ability. On the other hand, the mixed evaluation method has enabled this study to evaluate and compare the studied performance measures systematically and comprehensively via different performance characteristics.
format Thesis
qualification_level Doctorate
author Hossin, Mohammad
author_facet Hossin, Mohammad
author_sort Hossin, Mohammad
title Hybrid performance measures and mixed evaluation method for data classification problems
title_short Hybrid performance measures and mixed evaluation method for data classification problems
title_full Hybrid performance measures and mixed evaluation method for data classification problems
title_fullStr Hybrid performance measures and mixed evaluation method for data classification problems
title_full_unstemmed Hybrid performance measures and mixed evaluation method for data classification problems
title_sort hybrid performance measures and mixed evaluation method for data classification problems
granting_institution Universiti Putra Malaysia
publishDate 2012
url http://psasir.upm.edu.my/id/eprint/33140/1/FSKTM%202012%2022.pdf
_version_ 1811767720483487744