Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik

Anaphora resolution (AR) is a process to resolve reference entity of pronoun anaphora. It is a phenomenon that occur in every languages and requires human experts or specific rules in order to resolve it. AR able to improve language processing applications such as question-answering, text mining, do...

Full description

Saved in:
Bibliographic Details
Main Author: Noorhuzaimi@Karimah, Mohd Noor
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/25341/1/Resolusi%20anafora%20artikel%20Bahasa%20Melayu%20berasaskan%20pengetahuan%20terhad.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ump-ir.25341
record_format uketd_dc
spelling my-ump-ir.253412021-07-28T03:18:07Z Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik 2016 Noorhuzaimi@Karimah, Mohd Noor PL Languages and literatures of Eastern Asia, Africa, Oceania Anaphora resolution (AR) is a process to resolve reference entity of pronoun anaphora. It is a phenomenon that occur in every languages and requires human experts or specific rules in order to resolve it. AR able to improve language processing applications such as question-answering, text mining, document summarizations, and information extraction. There has been various research carried out on AR, but the majority of them were meant for languages such as English, Japanese and Norwegian. Very few and almost no research effort have been focussed on AR for Malay language. Therefore, the aim of this research is to resolve the phenomena of AR for Malay text by using knowledge poor approach and semantic class labelling model. In order to achieve the aim, a framework of the Malay AR has been developed as a guide to solve this phenomenon in Malay language. Meanwhile, the process to determine the type of usage for pronoun nya has been solved by using a set of rules, a set of similar words, and word filtering that has been generate from semantic class labelling model. This process is important because the use of pronoun nya in Malay text is the highest, amounting to 68% as compared to other pronouns that mostly depend on the sociological status of referring entity or antecedent. The antecedent candidate determination is an important process that should be considered. The antecedent candidates can be in the form of proper noun or nouns. In order to determine proper nouns as suitable candidates, two main processes need to be done: (1) the entity recognition for proper noun that has the word 'dan' and comma symbol (,); and (2) the process to determine the semantic label for each retrieved candidate in order to determine their sociological status. The research used part of the name gazetteers for people, organization, location and position. Testing has been conducted on 60 Malay articles with different classes of proper nouns. The results were compared with the benchmark data tagged by a Malay linguist. The result shows an average precision and recall values of 85% and 90% respectively. The proposed framework of AR by using knowledge poor approach for Malay text shows increased success rate by 18.79% as compared to the generic approach proposed by Mitkov and Lappin. 2016 Thesis http://umpir.ump.edu.my/id/eprint/25341/ http://umpir.ump.edu.my/id/eprint/25341/1/Resolusi%20anafora%20artikel%20Bahasa%20Melayu%20berasaskan%20pengetahuan%20terhad.pdf pdf en public phd doctoral Universiti Kebangsaan Malaysia Fakulti Teknologi dan Sains Maklumat
institution Universiti Malaysia Pahang Al-Sultan Abdullah
collection UMPSA Institutional Repository
language English
topic PL Languages and literatures of Eastern Asia
Africa
Oceania
spellingShingle PL Languages and literatures of Eastern Asia
Africa
Oceania
Noorhuzaimi@Karimah, Mohd Noor
Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
description Anaphora resolution (AR) is a process to resolve reference entity of pronoun anaphora. It is a phenomenon that occur in every languages and requires human experts or specific rules in order to resolve it. AR able to improve language processing applications such as question-answering, text mining, document summarizations, and information extraction. There has been various research carried out on AR, but the majority of them were meant for languages such as English, Japanese and Norwegian. Very few and almost no research effort have been focussed on AR for Malay language. Therefore, the aim of this research is to resolve the phenomena of AR for Malay text by using knowledge poor approach and semantic class labelling model. In order to achieve the aim, a framework of the Malay AR has been developed as a guide to solve this phenomenon in Malay language. Meanwhile, the process to determine the type of usage for pronoun nya has been solved by using a set of rules, a set of similar words, and word filtering that has been generate from semantic class labelling model. This process is important because the use of pronoun nya in Malay text is the highest, amounting to 68% as compared to other pronouns that mostly depend on the sociological status of referring entity or antecedent. The antecedent candidate determination is an important process that should be considered. The antecedent candidates can be in the form of proper noun or nouns. In order to determine proper nouns as suitable candidates, two main processes need to be done: (1) the entity recognition for proper noun that has the word 'dan' and comma symbol (,); and (2) the process to determine the semantic label for each retrieved candidate in order to determine their sociological status. The research used part of the name gazetteers for people, organization, location and position. Testing has been conducted on 60 Malay articles with different classes of proper nouns. The results were compared with the benchmark data tagged by a Malay linguist. The result shows an average precision and recall values of 85% and 90% respectively. The proposed framework of AR by using knowledge poor approach for Malay text shows increased success rate by 18.79% as compared to the generic approach proposed by Mitkov and Lappin.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Noorhuzaimi@Karimah, Mohd Noor
author_facet Noorhuzaimi@Karimah, Mohd Noor
author_sort Noorhuzaimi@Karimah, Mohd Noor
title Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_short Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_full Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_fullStr Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_full_unstemmed Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_sort resolusi anafora artikel bahasa melayu berasaskan pengetahuan terhad dan kelas semantik
granting_institution Universiti Kebangsaan Malaysia
granting_department Fakulti Teknologi dan Sains Maklumat
publishDate 2016
url http://umpir.ump.edu.my/id/eprint/25341/1/Resolusi%20anafora%20artikel%20Bahasa%20Melayu%20berasaskan%20pengetahuan%20terhad.pdf
_version_ 1783732093163929600