An enhanced robust association rules method for missing values imputation in Arabic language data set

In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were prop...

Full description

Saved in:
Bibliographic Details
Main Author: Salem, Awsan Thabet
Format: Thesis
Language:English
English
Published: 2023
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/27718/1/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
http://eprints.utem.edu.my/id/eprint/27718/2/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utem-ep.27718
record_format uketd_dc
spelling my-utem-ep.277182024-09-19T16:42:25Z An enhanced robust association rules method for missing values imputation in Arabic language data set 2023 Salem, Awsan Thabet Q Science (General) QA Mathematics In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were proposed with the aim of improving the completeness of the data sets of concern. Data imputation’s accuracy is a common indicator of a data imputation method’s efficiency. However, the efficiency of data imputation in nominal data sets can be affected by the nature of the language in which the data set is written. Thus, there is a pressing need to deal with the problem, especially in non-Latin languages such as the Arabic language. In this thesis, the Enhanced Robust Association Rules (ERAR) method for missing values imputation is proposed. ERAR will improve the way to handle the Arabic language's complexity in terms of morphology and misspellings by adding an Arabic preparation step. The preparation step consists of Normalization, Error Detection, and Error Correction processes. ERAR is an extension of the Iterative method that adds filtering of frequent items. This method deals with high missing value rates by adjusting the support threshold in every iteration of the algorithm. This research aims to test the hypothesis that Arabic preparation and the filtering steps will improve the imputation processes in terms of accuracy, speed, and memory used. The findings discovered that with different missing value rates, ERAR was able to offer the highest accuracy percentage value reached 99% in the Arabic poetry data set, and speed as compared to the Iterative method in English and Arabic data sets at most MV rates, unfortunately not against the DT method. Nevertheless, the ERAR consumed the highest memory usage as compared to other methods during the imputation processes. In threshold values, the ERAR, Iterative methods are affected by different threshold values, where the accuracy decreases by reducing the support values, the same goes for elapsed time. in terms of memory usage, there is no clear effect. In the future, the research can be extended by covering the numerical data and other Arabic language issues. There is also room to improve ERAR in terms of memory use and speed. 2023 Thesis http://eprints.utem.edu.my/id/eprint/27718/ http://eprints.utem.edu.my/id/eprint/27718/1/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf text en public http://eprints.utem.edu.my/id/eprint/27718/2/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf text en validuser https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=123584 phd masters Universiti Teknikal Malaysia Melaka Faculty of Information and Communication Technology Emran, Nurul Akmar
institution Universiti Teknikal Malaysia Melaka
collection UTeM Repository
language English
English
advisor Emran, Nurul Akmar
topic Q Science (General)
QA Mathematics
spellingShingle Q Science (General)
QA Mathematics
Salem, Awsan Thabet
An enhanced robust association rules method for missing values imputation in Arabic language data set
description In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were proposed with the aim of improving the completeness of the data sets of concern. Data imputation’s accuracy is a common indicator of a data imputation method’s efficiency. However, the efficiency of data imputation in nominal data sets can be affected by the nature of the language in which the data set is written. Thus, there is a pressing need to deal with the problem, especially in non-Latin languages such as the Arabic language. In this thesis, the Enhanced Robust Association Rules (ERAR) method for missing values imputation is proposed. ERAR will improve the way to handle the Arabic language's complexity in terms of morphology and misspellings by adding an Arabic preparation step. The preparation step consists of Normalization, Error Detection, and Error Correction processes. ERAR is an extension of the Iterative method that adds filtering of frequent items. This method deals with high missing value rates by adjusting the support threshold in every iteration of the algorithm. This research aims to test the hypothesis that Arabic preparation and the filtering steps will improve the imputation processes in terms of accuracy, speed, and memory used. The findings discovered that with different missing value rates, ERAR was able to offer the highest accuracy percentage value reached 99% in the Arabic poetry data set, and speed as compared to the Iterative method in English and Arabic data sets at most MV rates, unfortunately not against the DT method. Nevertheless, the ERAR consumed the highest memory usage as compared to other methods during the imputation processes. In threshold values, the ERAR, Iterative methods are affected by different threshold values, where the accuracy decreases by reducing the support values, the same goes for elapsed time. in terms of memory usage, there is no clear effect. In the future, the research can be extended by covering the numerical data and other Arabic language issues. There is also room to improve ERAR in terms of memory use and speed.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Master's degree
author Salem, Awsan Thabet
author_facet Salem, Awsan Thabet
author_sort Salem, Awsan Thabet
title An enhanced robust association rules method for missing values imputation in Arabic language data set
title_short An enhanced robust association rules method for missing values imputation in Arabic language data set
title_full An enhanced robust association rules method for missing values imputation in Arabic language data set
title_fullStr An enhanced robust association rules method for missing values imputation in Arabic language data set
title_full_unstemmed An enhanced robust association rules method for missing values imputation in Arabic language data set
title_sort enhanced robust association rules method for missing values imputation in arabic language data set
granting_institution Universiti Teknikal Malaysia Melaka
granting_department Faculty of Information and Communication Technology
publishDate 2023
url http://eprints.utem.edu.my/id/eprint/27718/1/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
http://eprints.utem.edu.my/id/eprint/27718/2/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
_version_ 1811771878635732992