Framework To Enhance Veracity And Quality Of Big Data
Massive amount of data are available for organisations to drive their business ahead of the competitors. Data collected from a variety of resources are dirty, and this will affect their business decisions. Various data cleansing tools are available to cater to the issue of dirty data. They offer bet...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | http://eprints.usm.my/52691/1/FAKHITAH%20BINTI%20RIDZUAN%20-%20TESIS24.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-usm-ep.52691 |
---|---|
record_format |
uketd_dc |
spelling |
my-usm-ep.526912022-05-31T16:58:40Z Framework To Enhance Veracity And Quality Of Big Data 2021-10 Ridzuan, Fakhitah QA75.5-76.95 Electronic computers. Computer science Massive amount of data are available for organisations to drive their business ahead of the competitors. Data collected from a variety of resources are dirty, and this will affect their business decisions. Various data cleansing tools are available to cater to the issue of dirty data. They offer better data quality, which will be a great help for the organisation to make sure their data is ready for the analysis. However, there has been an issue raised regarding the trustworthiness of the result, even though the quality of the data is high. Veracity is one of the characteristics of Big Data, which refers to the trustworthiness of the data. It always relates to data quality, but there has been less work on a standard that defines data quality, specifically for Big Data. Besides, most of the studies also show the need for data quality rule to satisfy a variety of errors present in the data. However, this process requires a domain expert that is expensive to employ. Consequently, this research proposes a method to automate data quality rules and an enhanced veracity assessment framework. The proposed method will automate the process of extracting data quality rules from the data source, which will reduce the interaction with the domain expert, and at the same time correctly verifying and validating the rules. The proposed method will be evaluated using the Veracity Enhancement Framework (VEF), to make sure the data has met the data quality dimension and able to deliver trustworthy result. The experimental result shows that the proposed automatic technique to extract data quality rules is able to correctly classify 9487 data with 4.6% error percentage. 2021-10 Thesis http://eprints.usm.my/52691/ http://eprints.usm.my/52691/1/FAKHITAH%20BINTI%20RIDZUAN%20-%20TESIS24.pdf application/pdf en public phd doctoral Universiti Sains Malaysia Pusat Pengajian Sains Komputer |
institution |
Universiti Sains Malaysia |
collection |
USM Institutional Repository |
language |
English |
topic |
QA75.5-76.95 Electronic computers Computer science |
spellingShingle |
QA75.5-76.95 Electronic computers Computer science Ridzuan, Fakhitah Framework To Enhance Veracity And Quality Of Big Data |
description |
Massive amount of data are available for organisations to drive their business ahead of the competitors. Data collected from a variety of resources are dirty, and this will affect their business decisions. Various data cleansing tools are available to cater to the issue of dirty data. They offer better data quality, which will be a great help for the organisation to make sure their data is ready for the analysis. However, there has been an issue raised regarding the trustworthiness of the result, even though the quality of the data is high. Veracity is one of the characteristics of Big Data, which refers to the trustworthiness of the data. It always relates to data quality, but there has been less work on a standard that defines data quality, specifically for Big Data. Besides, most of the studies also show the need for data quality rule to satisfy a variety of errors present in the data. However, this process requires a domain expert that is expensive to employ. Consequently, this research proposes a method to automate data quality rules and an enhanced veracity assessment framework. The proposed method will automate the process of extracting data quality rules from the data source, which will reduce the interaction with the domain expert, and at the same time correctly verifying and validating the rules. The proposed method will be evaluated using the Veracity Enhancement Framework (VEF), to make sure the data has met the data quality dimension and able to deliver trustworthy result. The experimental result shows that the proposed automatic technique to extract data quality rules is able to correctly classify 9487 data with 4.6% error percentage. |
format |
Thesis |
qualification_name |
Doctor of Philosophy (PhD.) |
qualification_level |
Doctorate |
author |
Ridzuan, Fakhitah |
author_facet |
Ridzuan, Fakhitah |
author_sort |
Ridzuan, Fakhitah |
title |
Framework To Enhance Veracity And Quality Of Big Data |
title_short |
Framework To Enhance Veracity And Quality Of Big Data |
title_full |
Framework To Enhance Veracity And Quality Of Big Data |
title_fullStr |
Framework To Enhance Veracity And Quality Of Big Data |
title_full_unstemmed |
Framework To Enhance Veracity And Quality Of Big Data |
title_sort |
framework to enhance veracity and quality of big data |
granting_institution |
Universiti Sains Malaysia |
granting_department |
Pusat Pengajian Sains Komputer |
publishDate |
2021 |
url |
http://eprints.usm.my/52691/1/FAKHITAH%20BINTI%20RIDZUAN%20-%20TESIS24.pdf |
_version_ |
1747822204452601856 |