The study of stemming algorithm on Malay words that begin with alphabets P, Q, Y, and Z from the translated Al-Quran / Suriani Mat

This thesis concerns a Malay language documents retrieval system. Stemming algorithm, Malay Quran translated documents and root dictionaries are used in order to complete this study. The performance of a Malay stemming algorithm is tested based on words beginning with letter 'p', 'q&#...

Full description

Saved in:
Bibliographic Details
Main Author: Mat, Suriani
Format: Thesis
Language:English
Published: 2001
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/98195/1/98195.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This thesis concerns a Malay language documents retrieval system. Stemming algorithm, Malay Quran translated documents and root dictionaries are used in order to complete this study. The performance of a Malay stemming algorithm is tested based on words beginning with letter 'p', 'q', 'y' and 'z', using 5 experiments. First experiment uses the original set of data collections. In second experiment, new words are added in the dictionary and the total value for i ' , 'm', 'p', 'q', 'y' and 'z' are modified in the header file "dcvarnew.h". Other than that, affixes rule format in file "rule.txt" are added and misspell words are corrected. Third, the locations of rules in file "rule.txt" are changed. For fourth experiment, words that have more than one root, old spelling words and spoken word are deleted from the dictionary. After the modification, the total value for 'k', 'm\ 'n' and 'p' in header file "dcvarnew.h" are corrected again. Otherwise, new code is added into module 'ubahejaan'. In fifth experiment, the spoken word is deleted from the dictionary and the total value for 'p' in file "dcvarnew.h" is corrected. Then alternative rule to solve the words pengawal, pengawalan and perangan is carried out. The objective of this project is achieved when the best order of the rules to use to stem the words that beginning with p', 'q', 'y' and 'z' is met. This involves the use of two combinations simultaneously such as the pair combination of 1234 as primary combinations and 3124 as the secondary. First, all the words used the 1234 combination, and if the program encountered that the words cannot be solved correctly, combination will be shifted to the secondary combination that is 3124 combination. These experiments can serves as a benchmark for future research in Malay language.