Stemming Algorithm in Searching Malay Text

Stemming is one of the processes that can be used to improve performance of a search engine. It reduces the variant word forms to common forms. This project evaluates the retrieval effectiveness of stemming algorithm in searching and retrieving relevant Malay Web pages based on user natural query w...

Full description

Saved in:
Bibliographic Details
Main Author: Rizauddin, Saian
Format: Thesis
Language:eng
eng
Published: 2004
Subjects:
Online Access:https://etd.uum.edu.my/1409/1/RIZAUDDIN_B._SAIAN.pdf
https://etd.uum.edu.my/1409/2/1.RIZAUDDIN_B._SAIAN.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.1409
record_format uketd_dc
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
topic QA76 Computer software
spellingShingle QA76 Computer software
Rizauddin, Saian
Stemming Algorithm in Searching Malay Text
description Stemming is one of the processes that can be used to improve performance of a search engine. It reduces the variant word forms to common forms. This project evaluates the retrieval effectiveness of stemming algorithm in searching and retrieving relevant Malay Web pages based on user natural query words. The retrieved Web pages are weighted and ranked using Inverse Document Frequency function. The retrieval effectiveness is measured using standard recall and precision. Experiments performed show that searching with stemming improves retrieval effectiveness when compared to searching without stemming algorithm.
format Thesis
qualification_name masters
qualification_level Master's degree
author Rizauddin, Saian
author_facet Rizauddin, Saian
author_sort Rizauddin, Saian
title Stemming Algorithm in Searching Malay Text
title_short Stemming Algorithm in Searching Malay Text
title_full Stemming Algorithm in Searching Malay Text
title_fullStr Stemming Algorithm in Searching Malay Text
title_full_unstemmed Stemming Algorithm in Searching Malay Text
title_sort stemming algorithm in searching malay text
granting_institution Universiti Utara Malaysia
granting_department Faculty of Information Technology
publishDate 2004
url https://etd.uum.edu.my/1409/1/RIZAUDDIN_B._SAIAN.pdf
https://etd.uum.edu.my/1409/2/1.RIZAUDDIN_B._SAIAN.pdf
_version_ 1747827140351492096
spelling my-uum-etd.14092013-07-24T12:11:49Z Stemming Algorithm in Searching Malay Text 2004 Rizauddin, Saian Faculty of Information Technology Faculty of Information Technology QA76 Computer software Stemming is one of the processes that can be used to improve performance of a search engine. It reduces the variant word forms to common forms. This project evaluates the retrieval effectiveness of stemming algorithm in searching and retrieving relevant Malay Web pages based on user natural query words. The retrieved Web pages are weighted and ranked using Inverse Document Frequency function. The retrieval effectiveness is measured using standard recall and precision. Experiments performed show that searching with stemming improves retrieval effectiveness when compared to searching without stemming algorithm. 2004 Thesis https://etd.uum.edu.my/1409/ https://etd.uum.edu.my/1409/1/RIZAUDDIN_B._SAIAN.pdf application/pdf eng validuser https://etd.uum.edu.my/1409/2/1.RIZAUDDIN_B._SAIAN.pdf application/pdf eng public masters masters Universiti Utara Malaysia Ahmad, F.. Yusoff, M. & Sembok, T. M. T. (1996). Experiments with a Stemming Algorithm for Malay Words. Journal of the American Society for Information Science, 47(12), 909-918. Cescone, N. (1978). Morphological Analysis and Lexicon Design for Natural Language Processing. Computers and Humanities, 11, 199-209. Ekmekcioglu,F. Cuna, Lynch, Michael F. & Willett, Peter (1996). Stemming and N-gram matching for term conflation in Turkish texts. Information Research, 1(1). Available at: http://informationr.net/ir/2-2/paper13.html. Frakes, W. B. (1992). Stemming Algorithms. In W. B. Frakes and R. Baeza (Ed.),Information Retrieval, Data Structures and Algorithms. (pp. 131-160). Prentice Hall. Frakes, W.B. (1984). Term Conflation for Information Retrieval. In van Rijsbergen, C.J.(Ed.), Research and Development in Information Retrieval (pp. 383-390). CUP: Cambridge. Freud, G.E. & Willett, P. (1982). Online Identification of Word Variants and Arbitrary Truncation Searching Using a String Similarity Measure. Information Technology Research and Development, 1, 177-187. Hafer, M.A. & Weiss, S.F. (1974). Word Segmentation by Letter Successor Varieties.Information Storage and Retrieval, 10,371-385. Harman, D. (1991). How Effective is Suffixing? .Journal of the American Society for Information Science, 42(1), 7-15. Idris, N. & Syed Mustapha, S. M. F. D. (2001, April 23). Stemming for Term Conflation in Malay Texts. International Conference ofArtrficia1 Intelligence, Las Vegas. p.1512-1517. Kantrowitz, M., Mohit, B., & Mittal, V. (2000). Stemming and Its Effects on TFIDF Ranking. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 357-359. Kobayashi, M. & Takeda, K. (2000, June). Information Retrieval on the Web. ACM Computing Surveys, 32(2),144-173. Lawrence, S. & Giles, C. L. (1999). Accessibility of information on the Web. Nature. Lennon, M., Peirce, D. S., Tarry, B. D. & Willet, P.(1981). An evaluation for some conflation algorithms for information retrieval. Journal of Information Science,3,177- 183. Lovins, J.B. (1968). Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11,22-31. Niedermair, G.T., Thurmair, G. & Buttel, I. (1985). MARS A Retrieval Tool on the Basis of Morphological Analysis. In van Rjsbergen, C. J. (Ed.), Research and Development in Information Retrieval (pp. 369-380). CUP: Cambridge. Paice, C. D. (1990). Another Stemmer. ACM SIGIR Forum, 24(3), 56-61. Pirkola, A. (2001, May). Morphological Typology of Languages for IR. Journal of Documentation, 57,330-348. Popovic, M. & Willet, P. (1992, June). The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data. Journal of the American Society for Information Science, J3(5), 384-390. Porter, M. F. (1980, July). An Algorithm for Sufix Stripping. Program, 14(3), 130-137. Raben, J. & Lieberman, D.V. (1976). Text comparison: principles and a program. In Jones, A & Churchouse, R. F. (Eds.), The computer in literacy and linguistic studies.(pp.297-308). Cardiff University of Wales Press. Savoy, J. (1993, January). Stemming of French Words Based on Grammatical Categories. .Journal of the American Society for Information Science, 44,1-9. Stephen, G.A. (1994). String Searching Algorithm. In Lecturer Notes Series on Computing.Singapore: World Scientific Publishing Co. Pte. Ltd. UlmschneiderJ, .E. & Doszkocs, T. (1983). A Practical Stemming Algorithm for Online Search Assistance. Online Review, 7, 301-318. Van Rijsbergen, C. J. (1979). Information Retrieval (Second Edition). London: Butterworths. Walker, S. & Jones, R.M. (1987). Improving Subject Retrieval in Online Catalogues. Stemming, Automatic Spelling Correction and Cross-Reference Tables, British Library Research Paper, London. Wen Ji-Rong., Nie Jian-Yun. & Zhang Hong-Jiang.(2001,May I). Clustering User Queries of a Search Engines. ACM, pp. 162-168. Yoshiaki, M. & Keishi, T. (1999, February). Finding Context Paths for Web Pages.Proceedings of the tenth ACM Conference on Hypertext and Hypermedia: returning to our diverse roots. Zainab Abu Bakar & Nurazzah Abd. Rahman (2004). Evaluating the Effectiveness of Conflation Methods in Retrieving Malay Translated Al-Quran Texts and Images. Conference on Scientific and Social Research, UiTM. Zeti Zuryani Mohd Zakuan (2004). Penipuan Kad Kredit di Malaysia. LLM Thesis,Universiti Kebangsaan Malaysia.