A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data

K-means algorithm is one of the well-known clustering algorithms that promise to converge to a local optimum in few iterative. However, traditional k-means algorithm is designed to cluster data of single target table. Due to the nature of data collected in real life applications, many data have been...

全面介紹

Saved in:

書目詳細資料
主要作者:	Gabriel, Jong Chiye
格式:	Thesis
語言:	English
出版:	2015
主題:	QA Mathematics
在線閱讀:	https://eprints.ums.edu.my/id/eprint/12105/1/mt0000000678.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

id	my-ums-ep.12105
record_format	uketd_dc
spelling	my-ums-ep.121052017-11-07T07:31:34Z A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data 2015 Gabriel, Jong Chiye QA Mathematics K-means algorithm is one of the well-known clustering algorithms that promise to converge to a local optimum in few iterative. However, traditional k-means algorithm is designed to cluster data of single target table. Due to the nature of data collected in real life applications, many data have been collected and stored in relational databases. Traditional clustering and classification learning algorithms cannot be applied directly in learning multi-relational databases. Several approaches have been designed and proposed to learn relational data which includes Inductive Logic Programming based approaches, Graph based approaches, Multi-View approaches and also Dynamic Aggregation of Relational Attributes approach. Dynamic Aggregation of Relational Attributes approach is very effective in learning relational data set. Dynamic Aggregation of Relational Attributes summarizes relational data by clustering records exist in non-target tables. However, the quality of summarization of data depends highly on the position of initial centroids selected. Thus, it may affect the overall classification task. Thus, this project proposes a Genetic Algorithm-based Clustering Ensembles in learning relational datasets by combining the results obtained from several k-means clustering runs with different values of number of clusters, in which the location of centroids are optimal for every sets of clusters. The effects of using different similarity measurements and applying different fitness functions for the genetic algorithm on the predictive accuracies of the classifiers are also studied. Based on the results obtained, it can be concluded that using the consensus result of several clustering results can increase the predictive accuracy of classification task. It can be concluded that the Euclidean distance has better performance on mutagenesis datasets and cosine similarity has better performance on hepatitis datasets when evaluated with Weka C4.5 classifier, but the other way round when NaÃ¯ve Bayes classifier is used for evaluation. 2015 Thesis https://eprints.ums.edu.my/id/eprint/12105/ https://eprints.ums.edu.my/id/eprint/12105/1/mt0000000678.pdf text en public masters Universiti Malaysia Sabah Faculty of Computing and Informatics
institution	Universiti Malaysia Sabah
collection	UMS Institutional Repository
language	English
topic	QA Mathematics
spellingShingle	QA Mathematics Gabriel, Jong Chiye A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
description	K-means algorithm is one of the well-known clustering algorithms that promise to converge to a local optimum in few iterative. However, traditional k-means algorithm is designed to cluster data of single target table. Due to the nature of data collected in real life applications, many data have been collected and stored in relational databases. Traditional clustering and classification learning algorithms cannot be applied directly in learning multi-relational databases. Several approaches have been designed and proposed to learn relational data which includes Inductive Logic Programming based approaches, Graph based approaches, Multi-View approaches and also Dynamic Aggregation of Relational Attributes approach. Dynamic Aggregation of Relational Attributes approach is very effective in learning relational data set. Dynamic Aggregation of Relational Attributes summarizes relational data by clustering records exist in non-target tables. However, the quality of summarization of data depends highly on the position of initial centroids selected. Thus, it may affect the overall classification task. Thus, this project proposes a Genetic Algorithm-based Clustering Ensembles in learning relational datasets by combining the results obtained from several k-means clustering runs with different values of number of clusters, in which the location of centroids are optimal for every sets of clusters. The effects of using different similarity measurements and applying different fitness functions for the genetic algorithm on the predictive accuracies of the classifiers are also studied. Based on the results obtained, it can be concluded that using the consensus result of several clustering results can increase the predictive accuracy of classification task. It can be concluded that the Euclidean distance has better performance on mutagenesis datasets and cosine similarity has better performance on hepatitis datasets when evaluated with Weka C4.5 classifier, but the other way round when NaÃ¯ve Bayes classifier is used for evaluation.
format	Thesis
qualification_level	Master's degree
author	Gabriel, Jong Chiye
author_facet	Gabriel, Jong Chiye
author_sort	Gabriel, Jong Chiye
title	A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_short	A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_full	A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_fullStr	A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_full_unstemmed	A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_sort	multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
granting_institution	Universiti Malaysia Sabah
granting_department	Faculty of Computing and Informatics
publishDate	2015
url	https://eprints.ums.edu.my/id/eprint/12105/1/mt0000000678.pdf
_version_	1747836440086052864

A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data

相似書籍