An enhanced term weighting scheme method of identifying and extracting terms for ontology learning and development

Social media is crucial in facilitating the Disaster Management (DM) communication process. However, the knowledge representation of DM Social Media (DMSM) is inadequate especially in ontology representation. Given to huge volume of DMSM unstructured text, information extraction for ontology develop...

Full description

Saved in:
Bibliographic Details
Main Author: Muhammad, Mahmud
Format: Thesis
Language:eng
eng
Published: 2023
Subjects:
Online Access:https://etd.uum.edu.my/10738/1/kebenaran%20mendeposit-membenarkan-s95772.pdf
https://etd.uum.edu.my/10738/2/s95772_01.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Social media is crucial in facilitating the Disaster Management (DM) communication process. However, the knowledge representation of DM Social Media (DMSM) is inadequate especially in ontology representation. Given to huge volume of DMSM unstructured text, information extraction for ontology development is achieved through text mining. However, existing works on text mining-based ontology development utilizes a well-known unsupervised scheme, TF-IDF that ignore document distribution and leads to high dimensionality of features. The main objectives of the study are to improve ontology development by enhancing supervised term weighting scheme (TWS) and developing DMSM ontology. The enhancement is achieved by identifying the existing supervised TWS and giving higher weightage to the positive category instead of the negative one, which results in the removal of irrelevant terms. The study is conducted by gathering DMSM scientific publications, performing pre-processing, and calculating the eight selected supervised TWS. All the schemes obtained high weightage on the negative category, instead of the positive category. An enhancement is performed by introducing a positive term frequency ratio and positive category ratio, whereby the enhanced schemes extract relevant terms to the positive category. The DMSM ontology is generated and evaluated using a gold-standard-based evaluation method for syntactic comparison, designing the ontology, and evaluating the learned ontology. From the results, it is found that good score is achieved for TF. IDFEC-based. Enhanced and TF. RF. Enhanced with 93.33% and 91.03% for precision, 80.8% and 78.02% for recall, and 0.87 and 0.84 for F-measure, respectively. Theoretically, this study contributes an enhanced supervised TWS by emphasizing the classification information of a corpus, hence features dimensionality can be reduced and boosts the importance of words that are distributed between the positive and the negative class. Practically the enhanced scheme provides an improved technique for ontology developers to extract relevant terms from unstructured scientific publication text especially for DMSM domain.