An enhanced robust association rules method for missing values imputation in Arabic language data set

In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were prop...

全面介紹

Saved in:

書目詳細資料
主要作者:	Salem, Awsan Thabet
格式:	Thesis
語言:	English English
出版:	2023
主題:	Q Science (General) QA Mathematics
在線閱讀:	http://eprints.utem.edu.my/id/eprint/27718/1/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf http://eprints.utem.edu.my/id/eprint/27718/2/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

id	my-utem-ep.27718
record_format	uketd_dc
spelling	my-utem-ep.277182024-09-19T16:42:25Z An enhanced robust association rules method for missing values imputation in Arabic language data set 2023 Salem, Awsan Thabet Q Science (General) QA Mathematics In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were proposed with the aim of improving the completeness of the data sets of concern. Data imputation’s accuracy is a common indicator of a data imputation method’s efficiency. However, the efficiency of data imputation in nominal data sets can be affected by the nature of the language in which the data set is written. Thus, there is a pressing need to deal with the problem, especially in non-Latin languages such as the Arabic language. In this thesis, the Enhanced Robust Association Rules (ERAR) method for missing values imputation is proposed. ERAR will improve the way to handle the Arabic language's complexity in terms of morphology and misspellings by adding an Arabic preparation step. The preparation step consists of Normalization, Error Detection, and Error Correction processes. ERAR is an extension of the Iterative method that adds filtering of frequent items. This method deals with high missing value rates by adjusting the support threshold in every iteration of the algorithm. This research aims to test the hypothesis that Arabic preparation and the filtering steps will improve the imputation processes in terms of accuracy, speed, and memory used. The findings discovered that with different missing value rates, ERAR was able to offer the highest accuracy percentage value reached 99% in the Arabic poetry data set, and speed as compared to the Iterative method in English and Arabic data sets at most MV rates, unfortunately not against the DT method. Nevertheless, the ERAR consumed the highest memory usage as compared to other methods during the imputation processes. In threshold values, the ERAR, Iterative methods are affected by different threshold values, where the accuracy decreases by reducing the support values, the same goes for elapsed time. in terms of memory usage, there is no clear effect. In the future, the research can be extended by covering the numerical data and other Arabic language issues. There is also room to improve ERAR in terms of memory use and speed. 2023 Thesis http://eprints.utem.edu.my/id/eprint/27718/ http://eprints.utem.edu.my/id/eprint/27718/1/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf text en public http://eprints.utem.edu.my/id/eprint/27718/2/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf text en validuser https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=123584 phd masters Universiti Teknikal Malaysia Melaka Faculty of Information and Communication Technology Emran, Nurul Akmar
institution	Universiti Teknikal Malaysia Melaka
collection	UTeM Repository
language	English English
advisor	Emran, Nurul Akmar
topic	Q Science (General) QA Mathematics
spellingShingle	Q Science (General) QA Mathematics Salem, Awsan Thabet An enhanced robust association rules method for missing values imputation in Arabic language data set
description	In data quality, missing values is one form of data completeness problem faced by people who deal with data. The failure to handle missing values usually causes unwanted consequences such as misleading analysis and decision-making. Thus, to deal with missing values, data imputation methods were proposed with the aim of improving the completeness of the data sets of concern. Data imputation’s accuracy is a common indicator of a data imputation method’s efficiency. However, the efficiency of data imputation in nominal data sets can be affected by the nature of the language in which the data set is written. Thus, there is a pressing need to deal with the problem, especially in non-Latin languages such as the Arabic language. In this thesis, the Enhanced Robust Association Rules (ERAR) method for missing values imputation is proposed. ERAR will improve the way to handle the Arabic language's complexity in terms of morphology and misspellings by adding an Arabic preparation step. The preparation step consists of Normalization, Error Detection, and Error Correction processes. ERAR is an extension of the Iterative method that adds filtering of frequent items. This method deals with high missing value rates by adjusting the support threshold in every iteration of the algorithm. This research aims to test the hypothesis that Arabic preparation and the filtering steps will improve the imputation processes in terms of accuracy, speed, and memory used. The findings discovered that with different missing value rates, ERAR was able to offer the highest accuracy percentage value reached 99% in the Arabic poetry data set, and speed as compared to the Iterative method in English and Arabic data sets at most MV rates, unfortunately not against the DT method. Nevertheless, the ERAR consumed the highest memory usage as compared to other methods during the imputation processes. In threshold values, the ERAR, Iterative methods are affected by different threshold values, where the accuracy decreases by reducing the support values, the same goes for elapsed time. in terms of memory usage, there is no clear effect. In the future, the research can be extended by covering the numerical data and other Arabic language issues. There is also room to improve ERAR in terms of memory use and speed.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Master's degree
author	Salem, Awsan Thabet
author_facet	Salem, Awsan Thabet
author_sort	Salem, Awsan Thabet
title	An enhanced robust association rules method for missing values imputation in Arabic language data set
title_short	An enhanced robust association rules method for missing values imputation in Arabic language data set
title_full	An enhanced robust association rules method for missing values imputation in Arabic language data set
title_fullStr	An enhanced robust association rules method for missing values imputation in Arabic language data set
title_full_unstemmed	An enhanced robust association rules method for missing values imputation in Arabic language data set
title_sort	enhanced robust association rules method for missing values imputation in arabic language data set
granting_institution	Universiti Teknikal Malaysia Melaka
granting_department	Faculty of Information and Communication Technology
publishDate	2023
url	http://eprints.utem.edu.my/id/eprint/27718/1/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf http://eprints.utem.edu.my/id/eprint/27718/2/An%20enhanced%20robust%20association%20rules%20method%20for%20missing%20values%20imputation%20in%20Arabic%20language%20data%20set.pdf
_version_	1811771878635732992

An enhanced robust association rules method for missing values imputation in Arabic language data set

相似書籍