An enhanced feature selection technique for classification of group based holy Quran verses

This thesis is about proposing an enhanced feature selection technique for text classification applications. Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated wit...

Full description

Saved in:
Bibliographic Details
Main Author: Abdullahi Oyekunle, Adeleke
Format: Thesis
Language:English
English
English
Published: 2018
Subjects:
Online Access:http://eprints.uthm.edu.my/516/1/24p%20ADELEKE%20ABDULLAHI%20OYEKUNLE.pdf
http://eprints.uthm.edu.my/516/2/ADELEKE%20ABDULLAHI%20OYEKUNLE%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/516/3/ADELEKE%20ABDULLAHI%20OYEKUNLE%20WATERMARK.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uthm-ep.516
record_format uketd_dc
spelling my-uthm-ep.5162021-07-25T08:32:22Z An enhanced feature selection technique for classification of group based holy Quran verses 2018-01 Abdullahi Oyekunle, Adeleke HB135-147 Mathematical economics. Quantitative methods. Including econometrics, input-output analysis, game theory This thesis is about proposing an enhanced feature selection technique for text classification applications. Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated with wrapper-based FS techniques and low classification accuracy performance associated with filter-based FS techniques. Therefore, in this study, a hybrid feature selection technique is proposed. The proposed FS technique is a combination of filter-based information gain (IG) and wrapper-based CFS algorithms. The purpose of combining these two FS algorithms is to achieve both high classification accuracy performance (wrapper) at lower computational runtime (filter). The study also developed a group-based Quran dataset to improve on the understanding and analysis of the textual data (Quranic verses). The group-based dataset is a combination of Holy Quran translation and commentary (tafsir). The Quranic verses were selected from two chapters, Surah Al- Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. In the experiment, six feature selection algorithms were applied: Information Gain (IG), Chi-square (CH), Pearson Correlation Coefficient (PCC), ReliefF, Correlation-based (CFS), and the proposed IG-CFS algorithms. The textual data (Quranic verses) were preprocessed using StringtoWordVector with weighted Term Frequency-Inverse Document Frequency (TF-IDF). Meanwhile, the classification phase has involved four algorithms: Naïve Bayes (NB), k-Nearest Neighbor (k-NN), Support Vector Machine (LibSVM), and Decision Trees (J48). The experiment results were evaluated based on two established performance metrics in text classification: Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC). The proposed hybrid feature selection technique has shown promising results in terms of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC) by achieving at a lower computational runtime (3.89secs) Accuracy of 94.5% and AUC of 0.944 with the group-based Quran dataset. 2018-01 Thesis http://eprints.uthm.edu.my/516/ http://eprints.uthm.edu.my/516/1/24p%20ADELEKE%20ABDULLAHI%20OYEKUNLE.pdf text en public http://eprints.uthm.edu.my/516/2/ADELEKE%20ABDULLAHI%20OYEKUNLE%20COPYRIGHT%20DECLARATION.pdf text en staffonly http://eprints.uthm.edu.my/516/3/ADELEKE%20ABDULLAHI%20OYEKUNLE%20WATERMARK.pdf text en validuser mphil masters Universiti Tun Hussein Onn Malaysia Faculty of Computer Science and Information Technology
institution Universiti Tun Hussein Onn Malaysia
collection UTHM Institutional Repository
language English
English
English
topic HB135-147 Mathematical economics
Quantitative methods
Including econometrics, input-output analysis, game theory
spellingShingle HB135-147 Mathematical economics
Quantitative methods
Including econometrics, input-output analysis, game theory
Abdullahi Oyekunle, Adeleke
An enhanced feature selection technique for classification of group based holy Quran verses
description This thesis is about proposing an enhanced feature selection technique for text classification applications. Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated with wrapper-based FS techniques and low classification accuracy performance associated with filter-based FS techniques. Therefore, in this study, a hybrid feature selection technique is proposed. The proposed FS technique is a combination of filter-based information gain (IG) and wrapper-based CFS algorithms. The purpose of combining these two FS algorithms is to achieve both high classification accuracy performance (wrapper) at lower computational runtime (filter). The study also developed a group-based Quran dataset to improve on the understanding and analysis of the textual data (Quranic verses). The group-based dataset is a combination of Holy Quran translation and commentary (tafsir). The Quranic verses were selected from two chapters, Surah Al- Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. In the experiment, six feature selection algorithms were applied: Information Gain (IG), Chi-square (CH), Pearson Correlation Coefficient (PCC), ReliefF, Correlation-based (CFS), and the proposed IG-CFS algorithms. The textual data (Quranic verses) were preprocessed using StringtoWordVector with weighted Term Frequency-Inverse Document Frequency (TF-IDF). Meanwhile, the classification phase has involved four algorithms: Naïve Bayes (NB), k-Nearest Neighbor (k-NN), Support Vector Machine (LibSVM), and Decision Trees (J48). The experiment results were evaluated based on two established performance metrics in text classification: Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC). The proposed hybrid feature selection technique has shown promising results in terms of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC) by achieving at a lower computational runtime (3.89secs) Accuracy of 94.5% and AUC of 0.944 with the group-based Quran dataset.
format Thesis
qualification_name Master of Philosophy (M.Phil.)
qualification_level Master's degree
author Abdullahi Oyekunle, Adeleke
author_facet Abdullahi Oyekunle, Adeleke
author_sort Abdullahi Oyekunle, Adeleke
title An enhanced feature selection technique for classification of group based holy Quran verses
title_short An enhanced feature selection technique for classification of group based holy Quran verses
title_full An enhanced feature selection technique for classification of group based holy Quran verses
title_fullStr An enhanced feature selection technique for classification of group based holy Quran verses
title_full_unstemmed An enhanced feature selection technique for classification of group based holy Quran verses
title_sort enhanced feature selection technique for classification of group based holy quran verses
granting_institution Universiti Tun Hussein Onn Malaysia
granting_department Faculty of Computer Science and Information Technology
publishDate 2018
url http://eprints.uthm.edu.my/516/1/24p%20ADELEKE%20ABDULLAHI%20OYEKUNLE.pdf
http://eprints.uthm.edu.my/516/2/ADELEKE%20ABDULLAHI%20OYEKUNLE%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/516/3/ADELEKE%20ABDULLAHI%20OYEKUNLE%20WATERMARK.pdf
_version_ 1747830629245911040