Framework for mining XML format business process log data

With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. Howev...

Full description

Saved in:

Bibliographic Details
Main Author:	Ang, Jin Sheng
Format:	Thesis
Language:	eng eng eng
Published:	2024
Subjects:	T58.5-58.64 Information technology QA299.6-433 Analysis
Online Access:	https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf https://etd.uum.edu.my/11012/2/s904045_01.pdf https://etd.uum.edu.my/11012/3/s904045_02.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-uum-etd.11012
record_format	uketd_dc
spelling	my-uum-etd.110122024-02-29T00:24:50Z Framework for mining XML format business process log data 2024 Ang, Jin Sheng Mohd Jamil, Jastini Mohd Shaharanee, Izwan Nizal Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Art & Sciences T58.5-58.64 Information technology QA299.6-433 Analysis With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields. 2024 Thesis https://etd.uum.edu.my/11012/ https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf text eng staffonly https://etd.uum.edu.my/11012/2/s904045_01.pdf text eng 2025-01-10 staffonly https://etd.uum.edu.my/11012/3/s904045_02.pdf text eng staffonly other doctoral Universiti Utara Malaysia
institution	Universiti Utara Malaysia
collection	UUM ETD
language	eng eng eng
advisor	Mohd Jamil, Jastini Mohd Shaharanee, Izwan Nizal
topic	T58.5-58.64 Information technology QA299.6-433 Analysis
spellingShingle	T58.5-58.64 Information technology QA299.6-433 Analysis Ang, Jin Sheng Framework for mining XML format business process log data
description	With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields.
format	Thesis
qualification_name	other
qualification_level	Doctorate
author	Ang, Jin Sheng
author_facet	Ang, Jin Sheng
author_sort	Ang, Jin Sheng
title	Framework for mining XML format business process log data
title_short	Framework for mining XML format business process log data
title_full	Framework for mining XML format business process log data
title_fullStr	Framework for mining XML format business process log data
title_full_unstemmed	Framework for mining XML format business process log data
title_sort	framework for mining xml format business process log data
granting_institution	Universiti Utara Malaysia
granting_department	Awang Had Salleh Graduate School of Arts & Sciences
publishDate	2024
url	https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf https://etd.uum.edu.my/11012/2/s904045_01.pdf https://etd.uum.edu.my/11012/3/s904045_02.pdf
_version_	1794023808057737216

Framework for mining XML format business process log data

Similar Items