Ensemble filters with harmonize algorithm for optimal solutions in medical datasets

Explosive increases of features in high dimensional datasets remains a challenge for data analysis in various research fields, especially the medical diagnosis sector, as it may affects the treatment received by the patients. Besides data dimensionality, classifiers such as Support Vector Machine (S...

Full description

Saved in:
Bibliographic Details
Main Author: Tengku Ab. Hamid, Tengku Mazlin
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/102978/1/TengkuMazlinTengkuAbHamidMSC2021.pdf.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.102978
record_format uketd_dc
spelling my-utm-ep.1029782023-10-12T08:34:10Z Ensemble filters with harmonize algorithm for optimal solutions in medical datasets 2021 Tengku Ab. Hamid, Tengku Mazlin QA76 Computer software Explosive increases of features in high dimensional datasets remains a challenge for data analysis in various research fields, especially the medical diagnosis sector, as it may affects the treatment received by the patients. Besides data dimensionality, classifiers such as Support Vector Machine (SVM) still lacks consistency in achieving an optimal performance due to improper kernel parameter settings. Commonly, the filter algorithm is frequently used for selecting relevant features due to its simple ranking strategies. However, most independent filter algorithms do not consider the intercorrelation between features, where a less dependent feature is the leading cause of why some features render irrelevant. Consequently, an imbalance number of features that could degrade the classification accuracy was produced. This problem can be alleviated using ensemble feature selection approach to identify the appropriate number of features by considering features dependency. In this study, an ensemble filters feature selection with harmonize classification algorithm has been proposed. The ensemble filters using Information Gain, Gain Ratio, Chi-squared and Relief-F are utilized with occurrence rate evaluation to identify the initial top-ranked features relevant for classification. A harmonize classification method is implemented using Particle Swarm Optimization (PSO) and SVM to synchronously determine the optimum kernel parameters and significant features as the optimal solution. The proposed method is evaluated on four medical datasets with different sizes in terms of accuracy, sensitivity, specificity, and Area under the Curve (AUC). Experimental results showed that the accuracy of the proposed method successfully increases significantly in each dataset by 96.15%, 95.41%, 96.62% and 96.50% with an optimal solution than conventional SVM. Via 10-fold cross-validation, the proposed method also signifies better classification performance compared to other existing methods. Therefore, the proposed method applies to handle high dimensional medical datasets for accurate disease prediction. 2021 Thesis http://eprints.utm.my/102978/ http://eprints.utm.my/102978/1/TengkuMazlinTengkuAbHamidMSC2021.pdf.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150761 masters Universiti Teknologi Malaysia Faculty of Engineering - School of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA76 Computer software
spellingShingle QA76 Computer software
Tengku Ab. Hamid, Tengku Mazlin
Ensemble filters with harmonize algorithm for optimal solutions in medical datasets
description Explosive increases of features in high dimensional datasets remains a challenge for data analysis in various research fields, especially the medical diagnosis sector, as it may affects the treatment received by the patients. Besides data dimensionality, classifiers such as Support Vector Machine (SVM) still lacks consistency in achieving an optimal performance due to improper kernel parameter settings. Commonly, the filter algorithm is frequently used for selecting relevant features due to its simple ranking strategies. However, most independent filter algorithms do not consider the intercorrelation between features, where a less dependent feature is the leading cause of why some features render irrelevant. Consequently, an imbalance number of features that could degrade the classification accuracy was produced. This problem can be alleviated using ensemble feature selection approach to identify the appropriate number of features by considering features dependency. In this study, an ensemble filters feature selection with harmonize classification algorithm has been proposed. The ensemble filters using Information Gain, Gain Ratio, Chi-squared and Relief-F are utilized with occurrence rate evaluation to identify the initial top-ranked features relevant for classification. A harmonize classification method is implemented using Particle Swarm Optimization (PSO) and SVM to synchronously determine the optimum kernel parameters and significant features as the optimal solution. The proposed method is evaluated on four medical datasets with different sizes in terms of accuracy, sensitivity, specificity, and Area under the Curve (AUC). Experimental results showed that the accuracy of the proposed method successfully increases significantly in each dataset by 96.15%, 95.41%, 96.62% and 96.50% with an optimal solution than conventional SVM. Via 10-fold cross-validation, the proposed method also signifies better classification performance compared to other existing methods. Therefore, the proposed method applies to handle high dimensional medical datasets for accurate disease prediction.
format Thesis
qualification_level Master's degree
author Tengku Ab. Hamid, Tengku Mazlin
author_facet Tengku Ab. Hamid, Tengku Mazlin
author_sort Tengku Ab. Hamid, Tengku Mazlin
title Ensemble filters with harmonize algorithm for optimal solutions in medical datasets
title_short Ensemble filters with harmonize algorithm for optimal solutions in medical datasets
title_full Ensemble filters with harmonize algorithm for optimal solutions in medical datasets
title_fullStr Ensemble filters with harmonize algorithm for optimal solutions in medical datasets
title_full_unstemmed Ensemble filters with harmonize algorithm for optimal solutions in medical datasets
title_sort ensemble filters with harmonize algorithm for optimal solutions in medical datasets
granting_institution Universiti Teknologi Malaysia
granting_department Faculty of Engineering - School of Computing
publishDate 2021
url http://eprints.utm.my/102978/1/TengkuMazlinTengkuAbHamidMSC2021.pdf.pdf
_version_ 1783729231518236672