Cold deck missing value imputation with a trust-based selection method of multiple web donors

Missing value is a common problem in any dataset and its occurrence decreases data completeness as data values are missing. Moreover, the problem reduces data quality and negatively impacted the result of data analysis. Existing cold deck imputation coped with this problem by selecting a replacem...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Jaya, Mohd Izham
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/83236/1/FSKTM%202018%2079%20-ir.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.83236
record_format uketd_dc
spelling my-upm-ir.832362022-01-07T08:33:00Z Cold deck missing value imputation with a trust-based selection method of multiple web donors 2018-12 Mohd Jaya, Mohd Izham Missing value is a common problem in any dataset and its occurrence decreases data completeness as data values are missing. Moreover, the problem reduces data quality and negatively impacted the result of data analysis. Existing cold deck imputation coped with this problem by selecting a replacement value from a pool of donors identified in other data sources during the imputation process. In comparison to other imputation methods, existing cold deck imputation has less risk on model misspecification and preserves data distribution in the dataset. Nevertheless, the limitation of the existing cold deck imputation is the chances in finding trusted plausible donor is narrow due to a usage of single data source in each imputation process. The availability of various web data sources today alleviates this limitation. However, as values from multiple web data sources are commonly conflicted to each other, adopting existing cold deck imputation with multiple web donors is not a practical solution as trust score on each of the conflicted values is not measured. Thus, it is difficult to select the most plausible value during imputation process. This research concentrates on improving data completeness by imputing missing values using a trust based cold deck imputation. Trust Based Cold Deck Missing Values Imputation with Multiple Web Donor is presented in this research. The proposed method takes advantage of multiple web donors from web data sources in order to provide higher chances in finding the most plausible values to impute missing values. The plausible values are selected based on the trust score computation’s novelty which is measured by accuracy score and reliability score of the web donor. The performance of the proposed method is evaluated by running a prediction model on the imputed dataset. A number of experiments are carried out to quantify the accuracy of the prediction model, Root Mean Squared Error (RMSE), and the F-Measure. The results demonstrate that the proposed method improves the performance of existing cold deck imputation. Additionally, the results are then compared with other imputation methods which are K-Nearest Neighbor (KNN), Mean Imputation (AVG), Case Deletion (IGN), Predictive Mean Matching (PMM) and MissForest. The results showed that the RMSE, prediction accuracy and FMeasure is improved when the prediction model is trained with datasets imputed using the proposed method. This research contributed to the improvement of data quality especially to the information system (IS) and database field where good data quality benefited the data analysis performance. Mathematical statistics Missing observations (Statistics) 2018-12 Thesis http://psasir.upm.edu.my/id/eprint/83236/ http://psasir.upm.edu.my/id/eprint/83236/1/FSKTM%202018%2079%20-ir.pdf text en public doctoral Universiti Putra Malaysia Mathematical statistics Missing observations (Statistics) Sidi, Fatimah
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Sidi, Fatimah
topic Mathematical statistics
Missing observations (Statistics)

spellingShingle Mathematical statistics
Missing observations (Statistics)

Mohd Jaya, Mohd Izham
Cold deck missing value imputation with a trust-based selection method of multiple web donors
description Missing value is a common problem in any dataset and its occurrence decreases data completeness as data values are missing. Moreover, the problem reduces data quality and negatively impacted the result of data analysis. Existing cold deck imputation coped with this problem by selecting a replacement value from a pool of donors identified in other data sources during the imputation process. In comparison to other imputation methods, existing cold deck imputation has less risk on model misspecification and preserves data distribution in the dataset. Nevertheless, the limitation of the existing cold deck imputation is the chances in finding trusted plausible donor is narrow due to a usage of single data source in each imputation process. The availability of various web data sources today alleviates this limitation. However, as values from multiple web data sources are commonly conflicted to each other, adopting existing cold deck imputation with multiple web donors is not a practical solution as trust score on each of the conflicted values is not measured. Thus, it is difficult to select the most plausible value during imputation process. This research concentrates on improving data completeness by imputing missing values using a trust based cold deck imputation. Trust Based Cold Deck Missing Values Imputation with Multiple Web Donor is presented in this research. The proposed method takes advantage of multiple web donors from web data sources in order to provide higher chances in finding the most plausible values to impute missing values. The plausible values are selected based on the trust score computation’s novelty which is measured by accuracy score and reliability score of the web donor. The performance of the proposed method is evaluated by running a prediction model on the imputed dataset. A number of experiments are carried out to quantify the accuracy of the prediction model, Root Mean Squared Error (RMSE), and the F-Measure. The results demonstrate that the proposed method improves the performance of existing cold deck imputation. Additionally, the results are then compared with other imputation methods which are K-Nearest Neighbor (KNN), Mean Imputation (AVG), Case Deletion (IGN), Predictive Mean Matching (PMM) and MissForest. The results showed that the RMSE, prediction accuracy and FMeasure is improved when the prediction model is trained with datasets imputed using the proposed method. This research contributed to the improvement of data quality especially to the information system (IS) and database field where good data quality benefited the data analysis performance.
format Thesis
qualification_level Doctorate
author Mohd Jaya, Mohd Izham
author_facet Mohd Jaya, Mohd Izham
author_sort Mohd Jaya, Mohd Izham
title Cold deck missing value imputation with a trust-based selection method of multiple web donors
title_short Cold deck missing value imputation with a trust-based selection method of multiple web donors
title_full Cold deck missing value imputation with a trust-based selection method of multiple web donors
title_fullStr Cold deck missing value imputation with a trust-based selection method of multiple web donors
title_full_unstemmed Cold deck missing value imputation with a trust-based selection method of multiple web donors
title_sort cold deck missing value imputation with a trust-based selection method of multiple web donors
granting_institution Universiti Putra Malaysia
publishDate 2018
url http://psasir.upm.edu.my/id/eprint/83236/1/FSKTM%202018%2079%20-ir.pdf
_version_ 1747813362752815104