Framework for stream clustering of trajectories based on temporal micro clustering technique
In recent years, spatio-temporal data has rapidly increased due to the tremendous prevalence of geolocation devices such as Global Positioning System, mobiles, and motion sensors. In some scenarios, spatio-temporal data are received in a streamed manner. Clustering of the stream data is a challen...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/76982/1/FSKTM%202018%2069%20-%20IR.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In recent years, spatio-temporal data has rapidly increased due to the tremendous
prevalence of geolocation devices such as Global Positioning System, mobiles, and
motion sensors. In some scenarios, spatio-temporal data are received in a streamed
manner. Clustering of the stream data is a challenging task due to single pass over the
data, the unbounded size of data stream, and the limited processing time and memory
space. However, it is a vital process for many applications such as traffic management,
studying of animal behavior, and weather forecasting. Many existing algorithms for
stream clustering of trajectory stream data exploit the time window technique which
partition trajectory stream data into time-bins and cluster the segments in each timebin
separately. Initiating clustering from scratch in each time-bin and not considering
the relationships between the objects from two consecutive time-bins lead to creating
redundant micro clusters centralizes in the border area between two adjacent time bins.
This is because, the clustering process which exploits the time window technique
creates new micro clusters for some trajectory segments at the start of each time-bin
even though these segments are too close (within distance threshold) to the micro
clusters at the end of previous time-bin . It is true that most similar micro clusters at
two consecutive time-bins can be merged to reduce memory space but creating
redundant micro clusters will dramatically slow down the clustering task. On the other
hand, trajectories preprocessing such as segmentation and noise points filtering is a
vital step which precedes the mining task. It aims to reduce the size of the trajectory
to minimize clustering time. Most of the existing algorithms consider the noise
filtering step precedes trajectory segmentation step which mean that there is no
preprocessing method to partition trajectory into set of segments and remove noise
points in real time with low computational cost.
As a response towards the limitations of time window technique and the offline preprocessing
methods, a framework for stream clustering of trajectories based on
temporal micro clustering technique (SCT-TMC) is proposed. The framework consists
of two stages: the preprocessing stage and the stream clustering stage. In the
preprocessing stage, the On-Line Noise Filtering Algorithm for Trajectory
Segmentation Based on Minimum Description Length concept (ONF_TRS) is
proposed to achieve trajectory segmentation and remove noise points in real time with
low computational cost. The minimum description length is an important concept in
information theory and computational learning in which the best hypothesis for a given
data set is the one that leads to the best compression of the data.
The stream clustering stage consists of two phases: the online and the offline phases.
In the online phase, the stream clustering algorithm for trajectories based on the
lifespan of the cluster is proposed (CC_TRS) to overcome the limitations of the time
window technique. The clustering algorithm consists of two components: the temporal
micro-clusters generation and the temporal micro clusters merging. The temporal
micro cluster data structure is proposed in CC_TRS algorithm to store the summarized
information for each group of similar segments. The algorithm assigns a life time for
each newly created temporal micro clusters so that the new incoming batch of
trajectories segments will interact only with the non-expired temporal micro clusters.
This is because the expired temporal micro clusters become far from segments at
current time either spatially or temporally which in turn, minimizes the searching time
for the nearest temporal micro cluster. When the size of the temporal micro clusters is
exceeded a given memory space, the most similar temporal micro-clusters will be
merged to save memory. On the other hand, the offline phase is evoked when the user
requests to view the overall clustering results. The DBSCAN algorithm is used to
perform the macro clustering task by replacing the distance between trajectories
segments with the distance between the temporal micro-clusters. The DBSCAN is a
density based clustering approach which is convenient for stream clustering due to the
properties: firstly, it does not require to specify the number of clusters in advance.
Secondly, the algorithm can find arbitrarily shaped clusters. Finally, it is robust to
noise and outliers. The suggestion of the temporal micro cluster data structure in the
online phase leads to the expansion of the functionality of the macro clustering phase.
The proposed functions for the offline phase are the detection of spatial and spatiotemporal
outliers and the macro clustering. A comprehensive experimental analysis
was conducted to evaluate the efficiency and effectiveness of the proposed algorithms
(ONF-TRS, CC-TRS). The results shows that the algorithm ONF-TRS has low
computational cost and high compression rate as compared with the existing works.
Furthermore, the CC-TRS improves and speeds up the clustering task as compared
with the latest works. |
---|