An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning

Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in m...

Full description

Saved in:
Bibliographic Details
Main Author: Islam, Md Kamrul
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ump-ir.31091
record_format uketd_dc
spelling my-ump-ir.310912023-03-27T02:40:20Z An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning 2019-10 Islam, Md Kamrul QA75 Electronic computers. Computer science Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in micro-clusters and the micro-clusters form the clusters. However, most of the clusters are either not fully online, or cannot handle the properties of data stream properly. Moreover, the algorithms require predefining the global optimal radius of micro-clusters, which is a difficult task, and an erroneous choice deteriorates the cluster quality. In addition, the algorithms ignore the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. This ignorance causes the degradation of clustering quality and the increase of the processing time as micro-clusters are deleted and created frequently due to evolving nature of data stream. In this study, a fully online density-based clustering algorithm called Buffer-based Online Clustering for Evolving Data Stream (BOCEDS) is presented. BOCEDS clusters the data stream in a single stage. The algorithm summarizes the data from data stream in micro-clusters. This algorithm maintains the local optimal radius of micro-clusters rather than a global and constant radius. Moreover, it introduces a buffer for storing irrelevant micro-clusters and a fully online pruning process for extracting the temporarily irrelevant micro-cluster from the buffer. The pruning process improves processing time. In addition, BOCEDS proposes an online micro-cluster energy updating function based on the spatial information of the data stream. Then, clustering graphs are generated based on the connectivity among micro-clusters. The clusters are generated from the clustering graphs. To evaluate the performance, BOCEDS algorithm is executed on two syntactic and one practical data streams. The experimental result shows BOCEDS is able to generate new clusters and remove outdated clusters with time as data stream contents change. The experiment on noisy data stream shows that BOCEDS algorithm can detect noise with an accuracy of approximately 100%. The overall clustering accuracy and purity are more than 99%. Experimental results are compared with other alternative online/offline hybrid density-based clustering algorithms. The average processing time for data point in the data stream is about 2 milliseconds which is much lower than the aligned clustering algorithms in literature. The algorithm is also more scalable to high dimensional data stream than the existing algorithms. The sensitivity of clustering parameters in BOCEDS is also measured. The result shows that in case of changing the values of parameters the cluster quality deviates by a very small amount (<1%). These results prove the superiority of BOCEDS algorithm over the existing clustering algorithms. The BOCEDS algorithm is then applied to real-world weather data streams to demonstrate its capability to detect the drifts in the data stream and discover arbitrarily shaped clusters. 2019-10 Thesis http://umpir.ump.edu.my/id/eprint/31091/ http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf pdf en public masters Universiti Malaysia Pahang Faculty of Computing Zamli, Kamal Zuhairi
institution Universiti Malaysia Pahang Al-Sultan Abdullah
collection UMPSA Institutional Repository
language English
advisor Zamli, Kamal Zuhairi
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Islam, Md Kamrul
An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
description Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in micro-clusters and the micro-clusters form the clusters. However, most of the clusters are either not fully online, or cannot handle the properties of data stream properly. Moreover, the algorithms require predefining the global optimal radius of micro-clusters, which is a difficult task, and an erroneous choice deteriorates the cluster quality. In addition, the algorithms ignore the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. This ignorance causes the degradation of clustering quality and the increase of the processing time as micro-clusters are deleted and created frequently due to evolving nature of data stream. In this study, a fully online density-based clustering algorithm called Buffer-based Online Clustering for Evolving Data Stream (BOCEDS) is presented. BOCEDS clusters the data stream in a single stage. The algorithm summarizes the data from data stream in micro-clusters. This algorithm maintains the local optimal radius of micro-clusters rather than a global and constant radius. Moreover, it introduces a buffer for storing irrelevant micro-clusters and a fully online pruning process for extracting the temporarily irrelevant micro-cluster from the buffer. The pruning process improves processing time. In addition, BOCEDS proposes an online micro-cluster energy updating function based on the spatial information of the data stream. Then, clustering graphs are generated based on the connectivity among micro-clusters. The clusters are generated from the clustering graphs. To evaluate the performance, BOCEDS algorithm is executed on two syntactic and one practical data streams. The experimental result shows BOCEDS is able to generate new clusters and remove outdated clusters with time as data stream contents change. The experiment on noisy data stream shows that BOCEDS algorithm can detect noise with an accuracy of approximately 100%. The overall clustering accuracy and purity are more than 99%. Experimental results are compared with other alternative online/offline hybrid density-based clustering algorithms. The average processing time for data point in the data stream is about 2 milliseconds which is much lower than the aligned clustering algorithms in literature. The algorithm is also more scalable to high dimensional data stream than the existing algorithms. The sensitivity of clustering parameters in BOCEDS is also measured. The result shows that in case of changing the values of parameters the cluster quality deviates by a very small amount (<1%). These results prove the superiority of BOCEDS algorithm over the existing clustering algorithms. The BOCEDS algorithm is then applied to real-world weather data streams to demonstrate its capability to detect the drifts in the data stream and discover arbitrarily shaped clusters.
format Thesis
qualification_level Master's degree
author Islam, Md Kamrul
author_facet Islam, Md Kamrul
author_sort Islam, Md Kamrul
title An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_short An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_full An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_fullStr An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_full_unstemmed An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_sort online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
granting_institution Universiti Malaysia Pahang
granting_department Faculty of Computing
publishDate 2019
url http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf
_version_ 1783732152230215680