Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters

Consensus clustering has an ability to overcome instability in estimating the number of clusters, k faced by traditional clustering approach. Consensus clustering offers better estimate by consolidating clustering results into an optimal value. However, the consensus clustering approach faced with t...

Full description

Saved in:
Bibliographic Details
Main Author: Norin Rahayu, Shamsuddin
Format: Thesis
Language:eng
eng
eng
Published: 2022
Subjects:
Online Access:https://etd.uum.edu.my/9744/1/permission%20to%20deposit-900985.pdf
https://etd.uum.edu.my/9744/2/s900985_01.pdf
https://etd.uum.edu.my/9744/3/s900985_02.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.9744
record_format uketd_dc
spelling my-uum-etd.97442022-08-14T01:25:58Z Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters 2022 Norin Rahayu, Shamsuddin Mahat, Nor Idayu Che Dom, Nazri Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Art & Sciences QA Mathematics Consensus clustering has an ability to overcome instability in estimating the number of clusters, k faced by traditional clustering approach. Consensus clustering offers better estimate by consolidating clustering results into an optimal value. However, the consensus clustering approach faced with three weakness which are lack of clear rules for construction of multiple base partitions, B; lack of specific procedure in combining the outcome of clustering from B into a single consolidated value; and suffers from excessive computational time and complexity in identifying k. Motivated by those weaknesses, this study designs a cross-validation consensus clustering using reference point at every base partition to obtain optimal number of clusters, ˘k*y to produce more robust and stable results. The proposed design creates base partitions using a 10-fold cross-validation approach. In each base partition, the reference point was imposed by extracting 30% of the objects from a dataset to identify ˘k*y. The ˘k*y is used to cluster the objects and identify its clusters. The designed was tested on both simulated and real datasets using stability index, heatmap visualisation and clustering validations. The findings showed that the proposed design performs better in term of computational times in clustering the objects in less than one minute once ˘k*y is obtained. The results also revealed that clustering throughout base partitions in both simulated and real datasets are robust and stable. The proposed design works well on non-overlapping clusters or unequal size of objects cases with least completion time for clustering process. The design also competitive to other clustering approaches in high overlapping clusters and unclear structure of clusters problems. 2022 Thesis https://etd.uum.edu.my/9744/ https://etd.uum.edu.my/9744/1/permission%20to%20deposit-900985.pdf text eng 2023-03-06 staffonly https://etd.uum.edu.my/9744/2/s900985_01.pdf text eng 2023-03-06 staffonly https://etd.uum.edu.my/9744/3/s900985_02.pdf text eng 2023-03-06 staffonly other doctoral Universiti Utara Malaysia
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
eng
advisor Mahat, Nor Idayu
Che Dom, Nazri
topic QA Mathematics
spellingShingle QA Mathematics
Norin Rahayu, Shamsuddin
Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters
description Consensus clustering has an ability to overcome instability in estimating the number of clusters, k faced by traditional clustering approach. Consensus clustering offers better estimate by consolidating clustering results into an optimal value. However, the consensus clustering approach faced with three weakness which are lack of clear rules for construction of multiple base partitions, B; lack of specific procedure in combining the outcome of clustering from B into a single consolidated value; and suffers from excessive computational time and complexity in identifying k. Motivated by those weaknesses, this study designs a cross-validation consensus clustering using reference point at every base partition to obtain optimal number of clusters, ˘k*y to produce more robust and stable results. The proposed design creates base partitions using a 10-fold cross-validation approach. In each base partition, the reference point was imposed by extracting 30% of the objects from a dataset to identify ˘k*y. The ˘k*y is used to cluster the objects and identify its clusters. The designed was tested on both simulated and real datasets using stability index, heatmap visualisation and clustering validations. The findings showed that the proposed design performs better in term of computational times in clustering the objects in less than one minute once ˘k*y is obtained. The results also revealed that clustering throughout base partitions in both simulated and real datasets are robust and stable. The proposed design works well on non-overlapping clusters or unequal size of objects cases with least completion time for clustering process. The design also competitive to other clustering approaches in high overlapping clusters and unclear structure of clusters problems.
format Thesis
qualification_name other
qualification_level Doctorate
author Norin Rahayu, Shamsuddin
author_facet Norin Rahayu, Shamsuddin
author_sort Norin Rahayu, Shamsuddin
title Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters
title_short Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters
title_full Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters
title_fullStr Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters
title_full_unstemmed Designing cross-validation consensus clustering with reference point in determining the optimal number of clusters
title_sort designing cross-validation consensus clustering with reference point in determining the optimal number of clusters
granting_institution Universiti Utara Malaysia
granting_department Awang Had Salleh Graduate School of Arts & Sciences
publishDate 2022
url https://etd.uum.edu.my/9744/1/permission%20to%20deposit-900985.pdf
https://etd.uum.edu.my/9744/2/s900985_01.pdf
https://etd.uum.edu.my/9744/3/s900985_02.pdf
_version_ 1747828663075733504