Missing data imputation framework for early childhood longitudinal data: a study case on NCDRC data

This research aims to develop an imputation framework for the National ChildhoodDevelopment Research Centre (NCDRC)s missing data. Missing data and other associatedissues, such as outliers, time points, noise, and continuity, were the main challenges in thisresearch. The nature of the NCDRC dataset...

全面介紹

Saved in:

書目詳細資料
主要作者:	Al-Amoodi, Abdullah Hussien Abdullah
格式:	thesis
語言:	eng
出版:	2019
主題:
在線閱讀:	https://ir.upsi.edu.my/detailsg.php?det=6762
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
總結:	This research aims to develop an imputation framework for the National ChildhoodDevelopment Research Centre (NCDRC)s missing data. Missing data and other associatedissues, such as outliers, time points, noise, and continuity, were the main challenges in thisresearch. The nature of the NCDRC dataset was not consistent with those reported in theliterature, with the latter being more randomly scattered and copious and having nopatterns, making it difficult to find and select relevant experimental data. The VIseKriterijumska Optimizacija Kompromisno Resenje (VIKOR) method was utilized to select thebest continuous portion of Body Mass Index (BMI) data over 182 different portions, whichaccounted for 911 participants (i.e. children with complete records) over seven (7) continuoustime points. Three different machine learning algorithms to impute the missing data weretested and evaluated, namely K-nearest Neighbour (KNN), Nave Bayes (NB), and DecisionTree (DT). Three evaluation performance indicators, namely t-test, Coefficient of Determination,and Root Mean Square Error, were used in the experiment using three configurations based on 5%,10%, and 15% missing data. The results of the experiment showed that KNNs performance scores weresignificantly higher than those of the other algorithms. Out of all scores, KNN achieved 95.23% ofthe scores, followed by NB with 94.04% and DT with 83.33 %, clearly indicating that KNNoutperformed DT and NB in the imputation of missing data. In conclusion, the main findingsuggests that the KNN algorithm is the most effective algorithm for imputing missing data. Theimplication of this study is that practitioners, especially NCDRCs personnel, can use the proposedmissing data imputation framework to help impute missing data of similar datasets.

Missing data imputation framework for early childhood longitudinal data: a study case on NCDRC data

相似書籍