Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data

Data Quality (DQ) assessment remains one of the major challenges for Big Data (BD) due to the complexity of handling large volumes of data. Traditional data transformation methods such as Extract-Transform-Load (ETL) use data sources from a diverse range of devices and locations resulting in incompl...

Full description

Saved in:
Bibliographic Details
Main Author: Onyeabor, Grace Amina
Format: Thesis
Language:eng
eng
Published: 2024
Subjects:
Online Access:https://etd.uum.edu.my/11184/1/depositpermission-900601.pdf
https://etd.uum.edu.my/11184/2/s900601_01.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.11184
record_format uketd_dc
spelling my-uum-etd.111842024-06-23T02:59:22Z Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data 2024 Onyeabor, Grace Amina Ta'a, Azman Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Arts And Sciences T58.5-58.64 Information technology Data Quality (DQ) assessment remains one of the major challenges for Big Data (BD) due to the complexity of handling large volumes of data. Traditional data transformation methods such as Extract-Transform-Load (ETL) use data sources from a diverse range of devices and locations resulting in incomplete and inconsistent DQ that may lead to wrong insights and decisions. Therefore, DQ is vital for the effective operation and management of BD. Recognizing many DQ features from its definition to the various dimensions is essential for equipping techniques and procedures to improve DQ. This research focuses on two aspects of DQ: completeness, and consistency. Firstly, an enhanced data transformation model (2CsDQT) is proposed to assess and improve big data quality. A new algorithm using ontology and clustering methods is used to identify and correct incomplete and inconsistent data, which resolves the availability and comprehensiveness of data, similarity between data items, and missing specific attributes of data. Secondly, using a clustering technique to analyse DQ, and improve employing results from the 2CsDQT model. The complete and consistent data are put into clusters, and the designed algorithm predicts the position of any incomplete and inconsistent data, based on its value to be added to the specific cluster. The study was evaluated using the developed model and benchmarked with existing data transformation techniques in the literature. This research shows that the 2CsDQT model successfully improves BD quality and outperforms previously proposed methods. Data completeness and consistency results outperform related articles and benchmark studies in the literature on the datasets of two different test cases. The theoretical contribution of this research work is to provide insight into the importance of DQ issues in BD and the effect of inconsistency and incompleteness on BD application. The practical contribution is the provision of enhanced data transformation models for DQ leading to better data analysis and strategic planning. 2024 Thesis https://etd.uum.edu.my/11184/ https://etd.uum.edu.my/11184/1/depositpermission-900601.pdf text eng staffonly https://etd.uum.edu.my/11184/2/s900601_01.pdf text eng public other doctoral Universiti Utara Malaysia
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
advisor Ta'a, Azman
topic T58.5-58.64 Information technology
spellingShingle T58.5-58.64 Information technology
Onyeabor, Grace Amina
Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
description Data Quality (DQ) assessment remains one of the major challenges for Big Data (BD) due to the complexity of handling large volumes of data. Traditional data transformation methods such as Extract-Transform-Load (ETL) use data sources from a diverse range of devices and locations resulting in incomplete and inconsistent DQ that may lead to wrong insights and decisions. Therefore, DQ is vital for the effective operation and management of BD. Recognizing many DQ features from its definition to the various dimensions is essential for equipping techniques and procedures to improve DQ. This research focuses on two aspects of DQ: completeness, and consistency. Firstly, an enhanced data transformation model (2CsDQT) is proposed to assess and improve big data quality. A new algorithm using ontology and clustering methods is used to identify and correct incomplete and inconsistent data, which resolves the availability and comprehensiveness of data, similarity between data items, and missing specific attributes of data. Secondly, using a clustering technique to analyse DQ, and improve employing results from the 2CsDQT model. The complete and consistent data are put into clusters, and the designed algorithm predicts the position of any incomplete and inconsistent data, based on its value to be added to the specific cluster. The study was evaluated using the developed model and benchmarked with existing data transformation techniques in the literature. This research shows that the 2CsDQT model successfully improves BD quality and outperforms previously proposed methods. Data completeness and consistency results outperform related articles and benchmark studies in the literature on the datasets of two different test cases. The theoretical contribution of this research work is to provide insight into the importance of DQ issues in BD and the effect of inconsistency and incompleteness on BD application. The practical contribution is the provision of enhanced data transformation models for DQ leading to better data analysis and strategic planning.
format Thesis
qualification_name other
qualification_level Doctorate
author Onyeabor, Grace Amina
author_facet Onyeabor, Grace Amina
author_sort Onyeabor, Grace Amina
title Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_short Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_full Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_fullStr Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_full_unstemmed Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_sort data transformation model for addressing incomplete and inconsistent quality issues of big data
granting_institution Universiti Utara Malaysia
granting_department Awang Had Salleh Graduate School of Arts & Sciences
publishDate 2024
url https://etd.uum.edu.my/11184/1/depositpermission-900601.pdf
https://etd.uum.edu.my/11184/2/s900601_01.pdf
_version_ 1804888213784887296