Multistage feature selection methods for data classification

In data analysis process, a good decision can be made with the assistance of several sub-processes and methods. The most common processes are feature selection and classification processes. Various methods and processes have been proposed to solve many issues such as low classification accuracy, and...

Full description

Saved in:
Bibliographic Details
Main Author: Mohamad, Masurah
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/id/eprint/101495/1/MasurahMohamadPSC2021.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.101495
record_format uketd_dc
spelling my-utm-ep.1014952023-06-21T10:20:49Z Multistage feature selection methods for data classification 2021 Mohamad, Masurah QA75 Electronic computers. Computer science In data analysis process, a good decision can be made with the assistance of several sub-processes and methods. The most common processes are feature selection and classification processes. Various methods and processes have been proposed to solve many issues such as low classification accuracy, and long processing time faced by the decision-makers. The analysis process becomes more complicated especially when dealing with complex datasets that consist of large and problematic datasets. One of the solutions that can be used is by employing an effective feature selection method to reduce the data processing time, decrease the used memory space, and increase the accuracy of decisions. However, not all the existing methods are capable of dealing with these issues. The aim of this research was to assist the classifier in giving a better performance when dealing with problematic datasets by generating optimised attribute set. The proposed method comprised two stages of feature selection processes, that employed correlation-based feature selection method using a best first search algorithm (CFS-BFS) and as well as a soft set and rough set parameter selection method (SSRS). CFS-BFS is used to eliminate uncorrelated attributes in a dataset meanwhile SSRS was utilized to manage any problematic values such as uncertainty in a dataset. Several bench-marking feature selection methods such as classifier subset evaluation (CSE) and principle component analysis (PCA) and different classifiers such as support vector machine (SVM) and neural network (NN) were used to validate the obtained results. ANOVA and T-test were also conducted to verify the obtained results. The obtained averages for two experimentalworks have proven that the proposed method equally matched the performance of other benchmarking methods in terms of assisting the classifier in achieving high classification performance for complex datasets. The obtained average for another experimental work has shown that the proposed work has outperformed the other benchmarking methods. In conclusion, the proposed method is significant to be used as an alternative feature selection method and able to assist the classifiers in achieving better accuracy in the classification process especially when dealing with problematic datasets. 2021 Thesis http://eprints.utm.my/id/eprint/101495/ http://eprints.utm.my/id/eprint/101495/1/MasurahMohamadPSC2021.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150561 phd doctoral Universiti Teknologi Malaysia Faculty of Engineering - School of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Mohamad, Masurah
Multistage feature selection methods for data classification
description In data analysis process, a good decision can be made with the assistance of several sub-processes and methods. The most common processes are feature selection and classification processes. Various methods and processes have been proposed to solve many issues such as low classification accuracy, and long processing time faced by the decision-makers. The analysis process becomes more complicated especially when dealing with complex datasets that consist of large and problematic datasets. One of the solutions that can be used is by employing an effective feature selection method to reduce the data processing time, decrease the used memory space, and increase the accuracy of decisions. However, not all the existing methods are capable of dealing with these issues. The aim of this research was to assist the classifier in giving a better performance when dealing with problematic datasets by generating optimised attribute set. The proposed method comprised two stages of feature selection processes, that employed correlation-based feature selection method using a best first search algorithm (CFS-BFS) and as well as a soft set and rough set parameter selection method (SSRS). CFS-BFS is used to eliminate uncorrelated attributes in a dataset meanwhile SSRS was utilized to manage any problematic values such as uncertainty in a dataset. Several bench-marking feature selection methods such as classifier subset evaluation (CSE) and principle component analysis (PCA) and different classifiers such as support vector machine (SVM) and neural network (NN) were used to validate the obtained results. ANOVA and T-test were also conducted to verify the obtained results. The obtained averages for two experimentalworks have proven that the proposed method equally matched the performance of other benchmarking methods in terms of assisting the classifier in achieving high classification performance for complex datasets. The obtained average for another experimental work has shown that the proposed work has outperformed the other benchmarking methods. In conclusion, the proposed method is significant to be used as an alternative feature selection method and able to assist the classifiers in achieving better accuracy in the classification process especially when dealing with problematic datasets.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Mohamad, Masurah
author_facet Mohamad, Masurah
author_sort Mohamad, Masurah
title Multistage feature selection methods for data classification
title_short Multistage feature selection methods for data classification
title_full Multistage feature selection methods for data classification
title_fullStr Multistage feature selection methods for data classification
title_full_unstemmed Multistage feature selection methods for data classification
title_sort multistage feature selection methods for data classification
granting_institution Universiti Teknologi Malaysia
granting_department Faculty of Engineering - School of Computing
publishDate 2021
url http://eprints.utm.my/id/eprint/101495/1/MasurahMohamadPSC2021.pdf
_version_ 1776100711571914752