Proper noun detection using regex algorithm and rules for malay named entity recognition

This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorith...

Full description

Saved in:
Bibliographic Details
Main Author: Farid Morsidi
Format: thesis
Language:eng
Published: 2018
Subjects:
Online Access:https://ir.upsi.edu.my/detailsg.php?det=5380
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorithm and rule were introduced in this study toovercome the limitation of dictionary and gazetteer. Two visualization techniques namely asDecision Tree and Term Document Matrix had been used to evaluate the efficiency of themethod. The result obtained 74% of accuracy during the generation of decision tree. Visualization for term document matrix achieves a maximized value of 9.8007403, 9.8718517, and9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively. As a conclusion, theregex algorithm could indicate the presence of Malay proper noun, thus making it an appropriatemethod for extraction tool to cluster and classify Malay proper noun. The study implicates thatthe use of Malay proper noun detection method can increase the effectiveness in namedentity recognition and beneficial to improve document retrieval for Malaylanguage.