Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document

This thesis concerns a Malay-English monolingual and cross-language information retrieval system. It presents a pioneer work in the aspects that are important for the development of Malay-English information retrieval system. An improved Malay stemming algorithm has been developed to stem the var...

Full description

Saved in:
Bibliographic Details
Main Author: Abdullah, Muhamad Taufik
Format: Thesis
Language:English
Published: 2006
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/5869/1/FSKTM_2006_1%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.5869
record_format uketd_dc
spelling my-upm-ir.58692022-01-13T02:54:28Z Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document 2006-02 Abdullah, Muhamad Taufik This thesis concerns a Malay-English monolingual and cross-language information retrieval system. It presents a pioneer work in the aspects that are important for the development of Malay-English information retrieval system. An improved Malay stemming algorithm has been developed to stem the various word forms into their common root for the purpose of indexing and retrieving of Malay documents. The new stemming approaches have been introduced for Malay language, namely Rules-Frequency-Order (RFO), Minimum-Rules-Frequency-Order (MRFO), Rules- Frequency-Application-Order (RFAO), and Rules-Application-Frequency-Order (RAFO). The performance of the new Malay stemming algorithm and approaches are tested using the first two chapters of the Malay translation of the Quranic documents. The results show that the new stemming algorithm and approaches are superior to the previous stemming algorithm and approach. The retrieval effectiveness of the stemming algorithm and approaches are then tested on the actual Quranic collection using vector space model and latent semantic indexing. The results show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and also from previous stemming algorithm to the new stemming algorithm. Since the employment of the new stemming algorithm and approaches achieved good performance results in Malay monolingual information retrieval, a Malay-English cross-language information retrieval experiment has been performed. The results again show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and from previous stemming algorithm to the new stemming algorithm. In addition, the results reveal that the new stemming in Malay has performed better than the English stemming in retrieving relevant document. The results can be a reference to forthcoming similar experiments and research for cross language testing of documents retrieval. Bilingualism - Malay Bilingualism - English 2006-02 Thesis http://psasir.upm.edu.my/id/eprint/5869/ http://psasir.upm.edu.my/id/eprint/5869/1/FSKTM_2006_1%20IR.pdf text en public doctoral Universiti Putra Malaysia Bilingualism - Malay Bilingualism - English Computer Science and Information Technology Ahmad, Fatimah
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Ahmad, Fatimah
topic Bilingualism - Malay
Bilingualism - English

spellingShingle Bilingualism - Malay
Bilingualism - English

Abdullah, Muhamad Taufik
Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document
description This thesis concerns a Malay-English monolingual and cross-language information retrieval system. It presents a pioneer work in the aspects that are important for the development of Malay-English information retrieval system. An improved Malay stemming algorithm has been developed to stem the various word forms into their common root for the purpose of indexing and retrieving of Malay documents. The new stemming approaches have been introduced for Malay language, namely Rules-Frequency-Order (RFO), Minimum-Rules-Frequency-Order (MRFO), Rules- Frequency-Application-Order (RFAO), and Rules-Application-Frequency-Order (RAFO). The performance of the new Malay stemming algorithm and approaches are tested using the first two chapters of the Malay translation of the Quranic documents. The results show that the new stemming algorithm and approaches are superior to the previous stemming algorithm and approach. The retrieval effectiveness of the stemming algorithm and approaches are then tested on the actual Quranic collection using vector space model and latent semantic indexing. The results show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and also from previous stemming algorithm to the new stemming algorithm. Since the employment of the new stemming algorithm and approaches achieved good performance results in Malay monolingual information retrieval, a Malay-English cross-language information retrieval experiment has been performed. The results again show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and from previous stemming algorithm to the new stemming algorithm. In addition, the results reveal that the new stemming in Malay has performed better than the English stemming in retrieving relevant document. The results can be a reference to forthcoming similar experiments and research for cross language testing of documents retrieval.
format Thesis
qualification_level Doctorate
author Abdullah, Muhamad Taufik
author_facet Abdullah, Muhamad Taufik
author_sort Abdullah, Muhamad Taufik
title Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document
title_short Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document
title_full Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document
title_fullStr Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document
title_full_unstemmed Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document
title_sort monolingual and cross-language information retrieval approaches for malay and english language document
granting_institution Universiti Putra Malaysia
granting_department Computer Science and Information Technology
publishDate 2006
url http://psasir.upm.edu.my/id/eprint/5869/1/FSKTM_2006_1%20IR.pdf
_version_ 1747810498637725696