Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection

Sequential pattern mining is a new branch of data, mining science that solves inter-transaction pattern mining problems. Efficiency and scalability on mining complete set of patterns is the challenge of sequential pattern mining. A comprehensive performance study has been reported that PrefixSpan,...

Full description

Saved in:
Bibliographic Details
Main Authors: Saputra , Dhany, Rambli, Dayang R.A., Foong, Oi Mean
Format: Thesis
Published: 2008
Subjects:
Online Access:http://eprints.utp.edu.my/2957/1/Dhany_2008.PDF
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utp-ep.2957
record_format uketd_dc
spelling my-utp-ep.29572017-01-19T08:26:13Z Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection 2008-07 Saputra , Dhany Rambli, Dayang R.A. Foong, Oi Mean QA75 Electronic computers. Computer science Sequential pattern mining is a new branch of data, mining science that solves inter-transaction pattern mining problems. Efficiency and scalability on mining complete set of patterns is the challenge of sequential pattern mining. A comprehensive performance study has been reported that PrefixSpan, one of the sequential pattern mining algorithms, outperforms GSP, SPADE, as well as FreeSpan in most cases, and PrefixSpan integrated with pseudoprojection technique is the fastest among those tested algorithms. Nevertheless, peudoprojection technique, which requires maintaining and visiting the in-memory sequenced database frequently until all patterns are found, consumes a considerable amount of memory space and induces the algorithm to undertake many redundant and unnecessary checks to this copy of original database into memory when the candidate patterns are examined. Moreover,improper management of intermediate databases may adversely affect the execution time and memory utilization. In the present work, Separator Database is proposed to improve PrefixSpan with pseudoprojection through early removal of uneconomical in-memory sequenced database, whilst SPM-Tree framework is proposed to build the intermediated databases. By means of procedures for building index set of longer patterns using Separator Database, some procedure in accordance to in-memory sequence database can be removed, thus most of the memory space can be released and some obliteration of redundant checks to in-memory sequence database reduce the executiont ime. By storing intermediated atabasesin to SPM-Tree Framework,the sequence database can be stored into memory and the index set may be built. Using Java as a case study, a series of experiment was conducted to select a suitable API class named Collections for this framework.The experimental results show that Separator Database always improves, exponentially in some cases, PrefixSpan with pseudoprojection. The results also show that in Java, A Arraylist is the most suitable choice for storing Object and Arraylnt list is the most suitablec choice for storing integer data. This novel approacho for integrating separator Database and Framework using these choices of Java collections outperforms with pseudoprojectionin terms of CPU performance and memory. Future research includes exploring the use of separator Database in PrefixSpan with pseudoprojection to improve mining generalized sequential patterns, particularly in handling mining constrained sequential patterns. 2008-07 Thesis http://eprints.utp.edu.my/2957/ http://eprints.utp.edu.my/2957/1/Dhany_2008.PDF application/pdf masters Universiti Teknologi PETRONAS Computer and Information Sciences Department
institution Universiti Teknologi PETRONAS
collection UTP Institutional Repository
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Saputra , Dhany
Rambli, Dayang R.A.
Foong, Oi Mean
Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection
description Sequential pattern mining is a new branch of data, mining science that solves inter-transaction pattern mining problems. Efficiency and scalability on mining complete set of patterns is the challenge of sequential pattern mining. A comprehensive performance study has been reported that PrefixSpan, one of the sequential pattern mining algorithms, outperforms GSP, SPADE, as well as FreeSpan in most cases, and PrefixSpan integrated with pseudoprojection technique is the fastest among those tested algorithms. Nevertheless, peudoprojection technique, which requires maintaining and visiting the in-memory sequenced database frequently until all patterns are found, consumes a considerable amount of memory space and induces the algorithm to undertake many redundant and unnecessary checks to this copy of original database into memory when the candidate patterns are examined. Moreover,improper management of intermediate databases may adversely affect the execution time and memory utilization. In the present work, Separator Database is proposed to improve PrefixSpan with pseudoprojection through early removal of uneconomical in-memory sequenced database, whilst SPM-Tree framework is proposed to build the intermediated databases. By means of procedures for building index set of longer patterns using Separator Database, some procedure in accordance to in-memory sequence database can be removed, thus most of the memory space can be released and some obliteration of redundant checks to in-memory sequence database reduce the executiont ime. By storing intermediated atabasesin to SPM-Tree Framework,the sequence database can be stored into memory and the index set may be built. Using Java as a case study, a series of experiment was conducted to select a suitable API class named Collections for this framework.The experimental results show that Separator Database always improves, exponentially in some cases, PrefixSpan with pseudoprojection. The results also show that in Java, A Arraylist is the most suitable choice for storing Object and Arraylnt list is the most suitablec choice for storing integer data. This novel approacho for integrating separator Database and Framework using these choices of Java collections outperforms with pseudoprojectionin terms of CPU performance and memory. Future research includes exploring the use of separator Database in PrefixSpan with pseudoprojection to improve mining generalized sequential patterns, particularly in handling mining constrained sequential patterns.
format Thesis
qualification_level Master's degree
author Saputra , Dhany
Rambli, Dayang R.A.
Foong, Oi Mean
author_facet Saputra , Dhany
Rambli, Dayang R.A.
Foong, Oi Mean
author_sort Saputra , Dhany
title Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection
title_short Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection
title_full Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection
title_fullStr Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection
title_full_unstemmed Separator Database and SPM Tree Framework for Mining Sequential Patterns Using Prefixspan with Pseudoprojection
title_sort separator database and spm tree framework for mining sequential patterns using prefixspan with pseudoprojection
granting_institution Universiti Teknologi PETRONAS
granting_department Computer and Information Sciences Department
publishDate 2008
url http://eprints.utp.edu.my/2957/1/Dhany_2008.PDF
_version_ 1747837930228940800