Framework for mining XML format business process log data

With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. Howev...

Full description

Saved in:
Bibliographic Details
Main Author: Ang, Jin Sheng
Format: Thesis
Language:eng
eng
eng
Published: 2024
Subjects:
Online Access:https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf
https://etd.uum.edu.my/11012/2/s904045_01.pdf
https://etd.uum.edu.my/11012/3/s904045_02.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.11012
record_format uketd_dc
spelling my-uum-etd.110122024-02-29T00:24:50Z Framework for mining XML format business process log data 2024 Ang, Jin Sheng Mohd Jamil, Jastini Mohd Shaharanee, Izwan Nizal Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Art & Sciences T58.5-58.64 Information technology QA299.6-433 Analysis With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields. 2024 Thesis https://etd.uum.edu.my/11012/ https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf text eng staffonly https://etd.uum.edu.my/11012/2/s904045_01.pdf text eng 2025-01-10 staffonly https://etd.uum.edu.my/11012/3/s904045_02.pdf text eng staffonly other doctoral Universiti Utara Malaysia
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
eng
advisor Mohd Jamil, Jastini
Mohd Shaharanee, Izwan Nizal
topic T58.5-58.64 Information technology
QA299.6-433 Analysis
spellingShingle T58.5-58.64 Information technology
QA299.6-433 Analysis
Ang, Jin Sheng
Framework for mining XML format business process log data
description With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields.
format Thesis
qualification_name other
qualification_level Doctorate
author Ang, Jin Sheng
author_facet Ang, Jin Sheng
author_sort Ang, Jin Sheng
title Framework for mining XML format business process log data
title_short Framework for mining XML format business process log data
title_full Framework for mining XML format business process log data
title_fullStr Framework for mining XML format business process log data
title_full_unstemmed Framework for mining XML format business process log data
title_sort framework for mining xml format business process log data
granting_institution Universiti Utara Malaysia
granting_department Awang Had Salleh Graduate School of Arts & Sciences
publishDate 2024
url https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf
https://etd.uum.edu.my/11012/2/s904045_01.pdf
https://etd.uum.edu.my/11012/3/s904045_02.pdf
_version_ 1794023808057737216