Hybrid concept-based lattice mining model using formal concept analysis (FCA) and adjacency matrix for Al-Qur'an text retrieval
Introduction: In Information Retrieval (IR), searching process involves a query that is matched to relevant documents using various techniques. Information retrieval regarding AI-Qur'an involves the retrieval of verses relating to specific concepts of interests but the contributions on the quer...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis Book |
Language: | English |
Subjects: | |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Introduction: In Information Retrieval (IR), searching process involves a query that is matched to relevant documents using various techniques. Information retrieval regarding AI-Qur'an involves the retrieval of verses relating to specific concepts of interests but the contributions on the query matching are relatively low due to the nature of the Qur'an itself. The process of extracting information from AI-Qur'an text is complicated where the challenges come in many forms such as same concepts that might be mentioned in different verses, a verse that may be alluded to many themes, a concept mentioned using different words, and a term that may refer to different things and might have different name(s). However, semantic query matching for AI-Qur'an text can be improved by emphasizing the processes of text extraction and similarity analysis. Therefore, this study aims to contribute to the process of semantic query matching focusing on the domain of pilgrimage by proposing a model called ConceptBased Lattice Mining (CBLM). Methodology: The research methodology involves four main stages that include key terms extraction, preparation of two datasets, Formal Concept Analysis (FCA) and concept-based lattice mining process, and finally measuring lattice similarity between FCA concept lattices. Prior to proposing the similarity algorithm, a comparison to a base model was conducted and it was found that the similarity formula gives similar answer to this research but it only measure first level similarity between graphs. However, this research proposes it further step in the algorithm to refine the degree of similarity within a dataset up to the second level. Dataset under study were 53 verses related to Hajj and Umrah from the AI-Qur'an (taken from AI-Hilali English extended Qur'an translation) and related hadiths. The reference dataset was obtained based on questions and answers related to Hajj and Umrah from the website of' Jabatan Agama dan Kemajuan Islam Malaysia' (JAKIM). Categorization of the datasets and results were validated by domain experts and implementation of the CBLM model in both datasets was evaluated by comparing accuracy and Kappa values. Results: After several experiments conducted, results showed that the accuracy obtained was from 70% to 83%, in line with the improvement of Kappa values. Overall, the performance of the dataset of JAKIM is consistent with the judgment by the domain experts; exhibiting its validity to be used as the reference dataset in testing the proposed technique of the CBLM model. Similar justification could be employed with the dataset of AI-Qur'an and Hadiths where superior performance in terms of average precision, F-Measure, and accuracy were observed; indicating its potential use in conjunction with the CBLM model. Since to date, there is no published standard on the range of acceptable percentage of accuracy for nonstandard datasets as in the case of this study, the accuracy obtained supported by improved Kappa's statistic is deemed satisfactory for this study. Conclusion: Overall, this research not only contributed to keyword extraction of Qur'anic text by proposing a hybrid text extraction model but also highlighted the importance ofFCA theory in the determination of the underlying concepts in Qur'anic text. It also indicates that the CBLM model contributes as a useful technique for similarity analysis using Formal Concept Analysis and graph theory. |
---|---|
Item Description: | x |
Physical Description: | xvii, 294 leaves; 31 cm. |
Bibliography: | Includes bibliographical references (leaves 233-253) |
ISBN: | x |