An enhanced robust association rules method for missing values imputation in Arabic language data set

In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were prop...

Full description

Saved in:
Bibliographic Details
Main Author: Salem, Awsan Thabet
Format: Thesis
Language:English
English
Published: 2023
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/27718/1/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
http://eprints.utem.edu.my/id/eprint/27718/2/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were proposed with the aim of improving the completeness of the data sets of concern. Data imputation’s accuracy is a common indicator of a data imputation method’s efficiency. However, the efficiency of data imputation in nominal data sets can be affected by the nature of the language in which the data set is written. Thus, there is a pressing need to deal with the problem, especially in non-Latin languages such as the Arabic language. In this thesis, the Enhanced Robust Association Rules (ERAR) method for missing values imputation is proposed. ERAR will improve the way to handle the Arabic language's complexity in terms of morphology and misspellings by adding an Arabic preparation step. The preparation step consists of Normalization, Error Detection, and Error Correction processes. ERAR is an extension of the Iterative method that adds filtering of frequent items. This method deals with high missing value rates by adjusting the support threshold in every iteration of the algorithm. This research aims to test the hypothesis that Arabic preparation and the filtering steps will improve the imputation processes in terms of accuracy, speed, and memory used. The findings discovered that with different missing value rates, ERAR was able to offer the highest accuracy percentage value reached 99% in the Arabic poetry data set, and speed as compared to the Iterative method in English and Arabic data sets at most MV rates, unfortunately not against the DT method. Nevertheless, the ERAR consumed the highest memory usage as compared to other methods during the imputation processes. In threshold values, the ERAR, Iterative methods are affected by different threshold values, where the accuracy decreases by reducing the support values, the same goes for elapsed time. in terms of memory usage, there is no clear effect. In the future, the research can be extended by covering the numerical data and other Arabic language issues. There is also room to improve ERAR in terms of memory use and speed.