Incorporating stemming algorithm in the Malay information retrieval that employs Thesaurus aproach / Mohd Rosmadi Mokhtar

This project incorporates the ROA stemming algorithm with thesaurus approach by Rapizal. It is an opportunity to find out whether combining stemming with thesaurus will improve retrieval effectiveness and efficiency. Advance in information technology has made it possible for a wide range of text-bas...

Full description

Saved in:
Bibliographic Details
Main Author: Mokhtar, Mohd Rosmadi
Format: Thesis
Language:English
Published: 2001
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/98015/1/98015.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This project incorporates the ROA stemming algorithm with thesaurus approach by Rapizal. It is an opportunity to find out whether combining stemming with thesaurus will improve retrieval effectiveness and efficiency. Advance in information technology has made it possible for a wide range of text-based information to be search and retrieved online, locally or from remote hosts. A wide range of text-based information therefore can be searched and retrieved from online connection anywhere in the world. This type of popularity is due to advancement in technology that is rapidly growing from day to day. There are many Malay word variants that have the same meaning available from Malay words itself. In order to overcome these words variants problems, the development of computational technique that could transform both user's search and database words into a single canonical form is introduces. It is known as conflation methods. One of well-known conflation methods is stemming algorithms, where it is used to identify morphological variants. Stemming algorithms are language dependent. They have proven to be successful to reduce words with the same stem to a common form and are evidenced by the work many researchers. Unfortunately, conflation method is unable to conflate different words that possess the same meaning. These words can only be conflated by a thesaurus that can handle hierarchic, synonymic, and also morphological relationship. To create a thesaurus for a given subject an extensive manual and highly skilled, therefore to solve this problem, another language dependent conflation method, thesaurus is used. Its can build all types of relationship that exist between words. The information retrieval thesaurus typically contains a list of terms, where a term is either a single word or phrase. The relationships between them are also included to assist in coordinating indexing and retrieval. So from this project study it is found that the incorporations of stemming algorithm and thesaurus successfully increase the retrieved and relevant documents using Malay query words but on the other hand reduces its efficiency.