Framework To Enhance Veracity And Quality Of Big Data

Massive amount of data are available for organisations to drive their business ahead of the competitors. Data collected from a variety of resources are dirty, and this will affect their business decisions. Various data cleansing tools are available to cater to the issue of dirty data. They offer bet...

Full description

Saved in:
Bibliographic Details
Main Author: Ridzuan, Fakhitah
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.usm.my/52691/1/FAKHITAH%20BINTI%20RIDZUAN%20-%20TESIS24.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Massive amount of data are available for organisations to drive their business ahead of the competitors. Data collected from a variety of resources are dirty, and this will affect their business decisions. Various data cleansing tools are available to cater to the issue of dirty data. They offer better data quality, which will be a great help for the organisation to make sure their data is ready for the analysis. However, there has been an issue raised regarding the trustworthiness of the result, even though the quality of the data is high. Veracity is one of the characteristics of Big Data, which refers to the trustworthiness of the data. It always relates to data quality, but there has been less work on a standard that defines data quality, specifically for Big Data. Besides, most of the studies also show the need for data quality rule to satisfy a variety of errors present in the data. However, this process requires a domain expert that is expensive to employ. Consequently, this research proposes a method to automate data quality rules and an enhanced veracity assessment framework. The proposed method will automate the process of extracting data quality rules from the data source, which will reduce the interaction with the domain expert, and at the same time correctly verifying and validating the rules. The proposed method will be evaluated using the Veracity Enhancement Framework (VEF), to make sure the data has met the data quality dimension and able to deliver trustworthy result. The experimental result shows that the proposed automatic technique to extract data quality rules is able to correctly classify 9487 data with 4.6% error percentage.