Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism

Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural languages). A good NLP approach is needed because the applications of NLP are used across a wide variety of industries in or...

Full description

Saved in:
Bibliographic Details
Main Author: Surayaini Basri
Format: Thesis
Language:English
English
Published: 2015
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/39472/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/39472/2/FULLTEXT.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural languages). A good NLP approach is needed because the applications of NLP are used across a wide variety of industries in order to solve critical knowledge problems, such as providing new insights gleaned from massive collection of unstructured content (social media, news, patent filings, financial disclosures, etc.). A weak NLP for a language can cause in irrelevant information being retrieved. The lack of works in building more effective algorithms in performing the stemming process, identifying misspelled words, and identifying neologism has affected the efficiency of retrieving relevant information or articles in Malay language. This is due to the fact that the Malay language is a language that has different and complex morphology structure than other languages and thus, the standard NLP approach used in other languages cannot be easily applied in processing and retrieving relevant information or articles in Malay Language. This work focuses on improving the Malay language stemming process, introducing a new approach in identifying and correcting typo or misspelled words and lastly proposing solution to identify neologism. By improving the Malay stemming process, it will enable the information retrieval process to be performed with more effectively by identifying more affixed word in Malay language because not all affixed words are stored in the standard Malay dictionary. By identifying and correcting typo or misspelled words, it can also prevent the information retrieval system from ignoring several important words just because the words are misspelled. Finally, by identifying neologism, one may assist lexicographer to identify new words that can be considered as part of the lexicon dictionary. Based on the experiments conducted, the proposed approaches are proven to be useful in improving the NLP in Malay language.