Tree-based contrast subspace mining method

Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner...

Full description

Saved in:
Bibliographic Details
Main Author: Florence Sia Fui Sze
Format: Thesis
Language:English
English
Published: 2020
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ums-ep.41108
record_format uketd_dc
spelling my-ums-ep.411082024-10-10T04:09:45Z Tree-based contrast subspace mining method 2020 Florence Sia Fui Sze TN1-997 Mining engineering. Metallurgy Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner and CSMiner-BPR) use density-based likelihood contrast scoring function to estimate the likelihood of a query object to target class against other class in a subspace. Query object resides in the area that has high ratio of probability density of target class to probability density of other class with respect to query object in a contrast subspace. However, the probability density estimation of a class requires adjustment to the dimensionality or number of features in subspaces which may affect the performance of mining contrast subspace. Besides, the parameter setting and the subspace search strategy of all existing methods are not being optimized to mine contrast subspace. They also cannot be directly applied to mine contrast subspaces in categorical data. In this thesis, a novel tree-based contrast subspace mining method is introduced which employs tree-based likelihood contrast scoring function that is not affected by the dimensionality of subspaces. Tree-based likelihood contrast scoring function recursively partitions a subspace space in the way that query object fall in a group that has high ratio of probability of target class and probability of other class in a contrast subspace. The tree-based method begins with feature selection phase which finds relevant features and followed by contrast subspace search phase to search contrast subspaces from the relevant features, accordance to the tree-based likelihood contrast scoring function. Genetic algorithm has been widely used to find global solution to optimization and search problem. Hence, this thesis presents the optimization of parameters values for the tree-based method by genetic algorithm. This thesis also presents the optimization of contrast subspace search of the tree-based method by genetic algorithm. In addition, the tree-based method is extended to mine contrast subspaces of query object in categorical data. The research works involve first preparing the real world numerical and categorical data sets. Then, the tree-based method, the genetic algorithm based parameter values identification of tree-based method, and followed by the genetic algorithm based tree-based method, for numerical data sets are developed and evaluated. Lastly, the extended tree-based method for categorical data sets is developed and evaluated. The effectiveness of the tree-based method in mining contrast subspace is evaluated by the classification accuracy on the obtained contrast subspaces with respect to query object. The empirical results demonstrated that the tree-based method is capable to find relevant contrast subspace of the given query object while the tree-based method with the optimized parameter setting is the best for mining contrast subspace in numerical data. Furthermore, the results exhibited that the extended tree-based method is capable to find contrast subspace of query object in categorical data. 2020 Thesis https://eprints.ums.edu.my/id/eprint/41108/ https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf text en public https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf text en validuser dphil doctoral Universiti Malaysia Sabah Faculty Of Computing And Informatics
institution Universiti Malaysia Sabah
collection UMS Institutional Repository
language English
English
topic TN1-997 Mining engineering
Metallurgy
spellingShingle TN1-997 Mining engineering
Metallurgy
Florence Sia Fui Sze
Tree-based contrast subspace mining method
description Mining contrast subspace finds subsets of features or subspaces where a query object is most likely similar to target class against other class in a multidimensional data set of two classes. Those subspaces are termed as contrast subspaces. All existing mining contrast subspace methods (i.e. CSMiner and CSMiner-BPR) use density-based likelihood contrast scoring function to estimate the likelihood of a query object to target class against other class in a subspace. Query object resides in the area that has high ratio of probability density of target class to probability density of other class with respect to query object in a contrast subspace. However, the probability density estimation of a class requires adjustment to the dimensionality or number of features in subspaces which may affect the performance of mining contrast subspace. Besides, the parameter setting and the subspace search strategy of all existing methods are not being optimized to mine contrast subspace. They also cannot be directly applied to mine contrast subspaces in categorical data. In this thesis, a novel tree-based contrast subspace mining method is introduced which employs tree-based likelihood contrast scoring function that is not affected by the dimensionality of subspaces. Tree-based likelihood contrast scoring function recursively partitions a subspace space in the way that query object fall in a group that has high ratio of probability of target class and probability of other class in a contrast subspace. The tree-based method begins with feature selection phase which finds relevant features and followed by contrast subspace search phase to search contrast subspaces from the relevant features, accordance to the tree-based likelihood contrast scoring function. Genetic algorithm has been widely used to find global solution to optimization and search problem. Hence, this thesis presents the optimization of parameters values for the tree-based method by genetic algorithm. This thesis also presents the optimization of contrast subspace search of the tree-based method by genetic algorithm. In addition, the tree-based method is extended to mine contrast subspaces of query object in categorical data. The research works involve first preparing the real world numerical and categorical data sets. Then, the tree-based method, the genetic algorithm based parameter values identification of tree-based method, and followed by the genetic algorithm based tree-based method, for numerical data sets are developed and evaluated. Lastly, the extended tree-based method for categorical data sets is developed and evaluated. The effectiveness of the tree-based method in mining contrast subspace is evaluated by the classification accuracy on the obtained contrast subspaces with respect to query object. The empirical results demonstrated that the tree-based method is capable to find relevant contrast subspace of the given query object while the tree-based method with the optimized parameter setting is the best for mining contrast subspace in numerical data. Furthermore, the results exhibited that the extended tree-based method is capable to find contrast subspace of query object in categorical data.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Florence Sia Fui Sze
author_facet Florence Sia Fui Sze
author_sort Florence Sia Fui Sze
title Tree-based contrast subspace mining method
title_short Tree-based contrast subspace mining method
title_full Tree-based contrast subspace mining method
title_fullStr Tree-based contrast subspace mining method
title_full_unstemmed Tree-based contrast subspace mining method
title_sort tree-based contrast subspace mining method
granting_institution Universiti Malaysia Sabah
granting_department Faculty Of Computing And Informatics
publishDate 2020
url https://eprints.ums.edu.my/id/eprint/41108/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/41108/2/FULLTEXT.pdf
_version_ 1818611376321462272