Arabic text classification based on artificial bee colony algorithm and semantic relations /

Documents contain a tremendous quantity of important human information. The use of automatic text classification is necessitated by the substantial increase in the volume of machine-readable documents for public or private access. Text classification is the process of categorizing or organizing docu...

Full description

Saved in:
Bibliographic Details
Main Author: Hijazi, Musab Mustafa (Author)
Format: Thesis Book
Language:English
Published: Kuala Lumpur : Kulliyyah of Information and Communication, International Islamic University Malaysia, 2022
Subjects:
Online Access:http://studentrepo.iium.edu.my/handle/123456789/11438
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Documents contain a tremendous quantity of important human information. The use of automatic text classification is necessitated by the substantial increase in the volume of machine-readable documents for public or private access. Text classification is the process of categorizing or organizing documents into a predetermined set of classes. Western languages, namely English, have received a lot of attention, whereas the Arabic language has received far less attention. Arabic text categorization methods emerged spontaneously as a result of the vast volume of diverse textual material provided in Arabic on the internet. The selection of features is an essential step in text categorization. It is an important preprocessing approach for effective data analysis, in which just a subset of the original data features is chosen after eliminating noisy, unnecessary, or duplicated features. Bag of Words (BoWs) representation is considered the simplest representation of texts. Most Arabic researchers have been trying to find an accurate Arabic text classification based on the traditional Bag of Words (BoWs) for data representation which does not consider the semantic relationships between the words, such as synonymy and hypernyms. This research aims to build a model for Arabic text classification using the Artificial bee colony algorithm as a feature selection method and Arabic WordNet (AWN) as a lexical and semantic resource to utilize the semantic relationships between the words. The results of the research showed that the proposed Chi-square – Binary Artificial Bee Colony chi-BABC feature selection method was able to reduce the dimensionality of the feature set and at the same time improve the text classification. It was able to reduce approximately 89% of the original feature list size when the Naïve Bayes classifier was used as a fitness function. On the other hand, around 94% of the original feature list size was reduced by the proposed feature selection method when Support Vector Machines was utilized as a fitness function. The proposed FS method was evaluated using Support Vector Machine, C4.5 Decision tree, and Naïve Bayes. Experiments showed that the proposed FS improved the performance of Arabic Text Classification with superior results for SVM with 86.9% compared with 84.5, and 77.3 for NB, and C4.5 respectively. Furthermore, the proposed FS method was compared with PSO, ACO, and GA. The experiment results showed that the proposed method outperformed the others by having 86.9% compared with 84.7%, 83.4%, and 82.7 for PSO, ACO, and GA respectively. Finally, utilizing concepts and semantic relations between them enriches the text representation by adding more semantic meaning, improving the text classification performance. The text classification performance based on grouping methods was enhanced by 2% for category term relation and 2%, and 3% for related to and has holo member relations respectively. The best classification performance was when the holo member relation is part of combined relations. The superior text classification result was 81.2 for the combination of related-to with has holo member relations while the lowest result was 78.6 for the combination of has hyponym with category term relations.
Item Description:Abstracts in English and Arabic.
"A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy in Computer Science." --On title page.
Physical Description:xxii, 158 leaves : illustrations ; 30cm.
Bibliography:Includes bibliographical references (leaves 131-157).