A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data

K-means algorithm is one of the well-known clustering algorithms that promise to converge to a local optimum in few iterative. However, traditional k-means algorithm is designed to cluster data of single target table. Due to the nature of data collected in real life applications, many data have been...

Full description

Saved in:
Bibliographic Details
Main Author: Gabriel, Jong Chiye
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/12105/1/mt0000000678.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ums-ep.12105
record_format uketd_dc
spelling my-ums-ep.121052017-11-07T07:31:34Z A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data 2015 Gabriel, Jong Chiye QA Mathematics K-means algorithm is one of the well-known clustering algorithms that promise to converge to a local optimum in few iterative. However, traditional k-means algorithm is designed to cluster data of single target table. Due to the nature of data collected in real life applications, many data have been collected and stored in relational databases. Traditional clustering and classification learning algorithms cannot be applied directly in learning multi-relational databases. Several approaches have been designed and proposed to learn relational data which includes Inductive Logic Programming based approaches, Graph based approaches, Multi-View approaches and also Dynamic Aggregation of Relational Attributes approach. Dynamic Aggregation of Relational Attributes approach is very effective in learning relational data set. Dynamic Aggregation of Relational Attributes summarizes relational data by clustering records exist in non-target tables. However, the quality of summarization of data depends highly on the position of initial centroids selected. Thus, it may affect the overall classification task. Thus, this project proposes a Genetic Algorithm-based Clustering Ensembles in learning relational datasets by combining the results obtained from several k-means clustering runs with different values of number of clusters, in which the location of centroids are optimal for every sets of clusters. The effects of using different similarity measurements and applying different fitness functions for the genetic algorithm on the predictive accuracies of the classifiers are also studied. Based on the results obtained, it can be concluded that using the consensus result of several clustering results can increase the predictive accuracy of classification task. It can be concluded that the Euclidean distance has better performance on mutagenesis datasets and cosine similarity has better performance on hepatitis datasets when evaluated with Weka C4.5 classifier, but the other way round when Naïve Bayes classifier is used for evaluation. 2015 Thesis https://eprints.ums.edu.my/id/eprint/12105/ https://eprints.ums.edu.my/id/eprint/12105/1/mt0000000678.pdf text en public masters Universiti Malaysia Sabah Faculty of Computing and Informatics
institution Universiti Malaysia Sabah
collection UMS Institutional Repository
language English
topic QA Mathematics
spellingShingle QA Mathematics
Gabriel, Jong Chiye
A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
description K-means algorithm is one of the well-known clustering algorithms that promise to converge to a local optimum in few iterative. However, traditional k-means algorithm is designed to cluster data of single target table. Due to the nature of data collected in real life applications, many data have been collected and stored in relational databases. Traditional clustering and classification learning algorithms cannot be applied directly in learning multi-relational databases. Several approaches have been designed and proposed to learn relational data which includes Inductive Logic Programming based approaches, Graph based approaches, Multi-View approaches and also Dynamic Aggregation of Relational Attributes approach. Dynamic Aggregation of Relational Attributes approach is very effective in learning relational data set. Dynamic Aggregation of Relational Attributes summarizes relational data by clustering records exist in non-target tables. However, the quality of summarization of data depends highly on the position of initial centroids selected. Thus, it may affect the overall classification task. Thus, this project proposes a Genetic Algorithm-based Clustering Ensembles in learning relational datasets by combining the results obtained from several k-means clustering runs with different values of number of clusters, in which the location of centroids are optimal for every sets of clusters. The effects of using different similarity measurements and applying different fitness functions for the genetic algorithm on the predictive accuracies of the classifiers are also studied. Based on the results obtained, it can be concluded that using the consensus result of several clustering results can increase the predictive accuracy of classification task. It can be concluded that the Euclidean distance has better performance on mutagenesis datasets and cosine similarity has better performance on hepatitis datasets when evaluated with Weka C4.5 classifier, but the other way round when Naïve Bayes classifier is used for evaluation.
format Thesis
qualification_level Master's degree
author Gabriel, Jong Chiye
author_facet Gabriel, Jong Chiye
author_sort Gabriel, Jong Chiye
title A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_short A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_full A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_fullStr A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_full_unstemmed A multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
title_sort multi-objectives genetic algorithm clustering ensembles based approach to summarize relational data
granting_institution Universiti Malaysia Sabah
granting_department Faculty of Computing and Informatics
publishDate 2015
url https://eprints.ums.edu.my/id/eprint/12105/1/mt0000000678.pdf
_version_ 1747836440086052864