Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism

Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural languages). A good NLP approach is needed because the applications of NLP are used across a wide variety of industries in or...

全面介绍

Saved in:
书目详细资料
主要作者: Surayaini Basri
格式: Thesis
语言:English
English
出版: 2015
主题:
在线阅读:https://eprints.ums.edu.my/id/eprint/39472/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/39472/2/FULLTEXT.pdf
标签: 添加标签
没有标签, 成为第一个标记此记录!
id my-ums-ep.39472
record_format uketd_dc
spelling my-ums-ep.394722024-08-09T00:18:04Z Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism 2015 Surayaini Basri QA76.75-76.765 Computer software Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural languages). A good NLP approach is needed because the applications of NLP are used across a wide variety of industries in order to solve critical knowledge problems, such as providing new insights gleaned from massive collection of unstructured content (social media, news, patent filings, financial disclosures, etc.). A weak NLP for a language can cause in irrelevant information being retrieved. The lack of works in building more effective algorithms in performing the stemming process, identifying misspelled words, and identifying neologism has affected the efficiency of retrieving relevant information or articles in Malay language. This is due to the fact that the Malay language is a language that has different and complex morphology structure than other languages and thus, the standard NLP approach used in other languages cannot be easily applied in processing and retrieving relevant information or articles in Malay Language. This work focuses on improving the Malay language stemming process, introducing a new approach in identifying and correcting typo or misspelled words and lastly proposing solution to identify neologism. By improving the Malay stemming process, it will enable the information retrieval process to be performed with more effectively by identifying more affixed word in Malay language because not all affixed words are stored in the standard Malay dictionary. By identifying and correcting typo or misspelled words, it can also prevent the information retrieval system from ignoring several important words just because the words are misspelled. Finally, by identifying neologism, one may assist lexicographer to identify new words that can be considered as part of the lexicon dictionary. Based on the experiments conducted, the proposed approaches are proven to be useful in improving the NLP in Malay language. 2015 Thesis https://eprints.ums.edu.my/id/eprint/39472/ https://eprints.ums.edu.my/id/eprint/39472/1/24%20PAGES.pdf text en public https://eprints.ums.edu.my/id/eprint/39472/2/FULLTEXT.pdf text en validuser masters Universiti Malaysia Sabah Fakulti Komputeran dan Informatik
institution Universiti Malaysia Sabah
collection UMS Institutional Repository
language English
English
topic QA76.75-76.765 Computer software
spellingShingle QA76.75-76.765 Computer software
Surayaini Basri
Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism
description Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural languages). A good NLP approach is needed because the applications of NLP are used across a wide variety of industries in order to solve critical knowledge problems, such as providing new insights gleaned from massive collection of unstructured content (social media, news, patent filings, financial disclosures, etc.). A weak NLP for a language can cause in irrelevant information being retrieved. The lack of works in building more effective algorithms in performing the stemming process, identifying misspelled words, and identifying neologism has affected the efficiency of retrieving relevant information or articles in Malay language. This is due to the fact that the Malay language is a language that has different and complex morphology structure than other languages and thus, the standard NLP approach used in other languages cannot be easily applied in processing and retrieving relevant information or articles in Malay Language. This work focuses on improving the Malay language stemming process, introducing a new approach in identifying and correcting typo or misspelled words and lastly proposing solution to identify neologism. By improving the Malay stemming process, it will enable the information retrieval process to be performed with more effectively by identifying more affixed word in Malay language because not all affixed words are stored in the standard Malay dictionary. By identifying and correcting typo or misspelled words, it can also prevent the information retrieval system from ignoring several important words just because the words are misspelled. Finally, by identifying neologism, one may assist lexicographer to identify new words that can be considered as part of the lexicon dictionary. Based on the experiments conducted, the proposed approaches are proven to be useful in improving the NLP in Malay language.
format Thesis
qualification_level Master's degree
author Surayaini Basri
author_facet Surayaini Basri
author_sort Surayaini Basri
title Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism
title_short Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism
title_full Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism
title_fullStr Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism
title_full_unstemmed Enhancing the natural language processing for Malay language stemming, identifying and correcting misspelled words identifying neologism
title_sort enhancing the natural language processing for malay language stemming, identifying and correcting misspelled words identifying neologism
granting_institution Universiti Malaysia Sabah
granting_department Fakulti Komputeran dan Informatik
publishDate 2015
url https://eprints.ums.edu.my/id/eprint/39472/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/39472/2/FULLTEXT.pdf
_version_ 1811770516139147264