Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar

Malay compound word is defined as a form of words that exists when two or more words are combined into a single syntax, and it gives a specific meaning. Thus, this extraction of compound words is significant for the following research, which is text summarization, grammar checker, sentiments analysi...

Full description

Saved in:
Bibliographic Details
Main Author: Abu Bakar, Zamri
Format: Thesis
Language:English
Published: 2023
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/88705/1/88705.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uitm-ir.88705
record_format uketd_dc
spelling my-uitm-ir.887052023-12-25T22:59:13Z Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar 2023 Abu Bakar, Zamri PL Languages and literatures of Eastern Asia, Africa, Oceania Malay compound word is defined as a form of words that exists when two or more words are combined into a single syntax, and it gives a specific meaning. Thus, this extraction of compound words is significant for the following research, which is text summarization, grammar checker, sentiments analysis and machine translation. The aim of this study is to propose a new extraction technique using linguistic approaches that combines many features and rules. There are many research efforts that have been proposed in extracting compound word using linguistic approaches. However, the result for this approach still produces some problems in giving a better result. Overall, this study has three objectives; to identify new rules in detecting the Malay compound word, to construct an improved compound word extraction technique (algorithm) that combines many rules for Malay sentences using linguistic approaches, and lastly to evaluate the accuracy of proposed technique from using the standard evaluation of Recall, Precious and F-Measure. To achieve the objective, this research explores a linguistic method for extracting compound word from standard Malay corpus. A Malay news dataset was used to extract compound word in this research. Therefore, an improvement for the effectiveness of the compound word extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach to enhance the extraction of compound word processing. Several preprocessing steps were involved which include normalization, tokenization, stemming and tagging. Finally, this study described several rules-based and modified the rules to get the most relevant relation between the first word and the second word in order to assist this study in solving the problems. 2023 Thesis https://ir.uitm.edu.my/id/eprint/88705/ https://ir.uitm.edu.my/id/eprint/88705/1/88705.pdf text en public phd doctoral Universiti Teknologi MARA (UiTM) College of Computing, Informatics and Mathematics Ismail, Normaly Kamal
institution Universiti Teknologi MARA
collection UiTM Institutional Repository
language English
advisor Ismail, Normaly Kamal
topic PL Languages and literatures of Eastern Asia
Africa
Oceania
spellingShingle PL Languages and literatures of Eastern Asia
Africa
Oceania
Abu Bakar, Zamri
Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar
description Malay compound word is defined as a form of words that exists when two or more words are combined into a single syntax, and it gives a specific meaning. Thus, this extraction of compound words is significant for the following research, which is text summarization, grammar checker, sentiments analysis and machine translation. The aim of this study is to propose a new extraction technique using linguistic approaches that combines many features and rules. There are many research efforts that have been proposed in extracting compound word using linguistic approaches. However, the result for this approach still produces some problems in giving a better result. Overall, this study has three objectives; to identify new rules in detecting the Malay compound word, to construct an improved compound word extraction technique (algorithm) that combines many rules for Malay sentences using linguistic approaches, and lastly to evaluate the accuracy of proposed technique from using the standard evaluation of Recall, Precious and F-Measure. To achieve the objective, this research explores a linguistic method for extracting compound word from standard Malay corpus. A Malay news dataset was used to extract compound word in this research. Therefore, an improvement for the effectiveness of the compound word extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach to enhance the extraction of compound word processing. Several preprocessing steps were involved which include normalization, tokenization, stemming and tagging. Finally, this study described several rules-based and modified the rules to get the most relevant relation between the first word and the second word in order to assist this study in solving the problems.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Abu Bakar, Zamri
author_facet Abu Bakar, Zamri
author_sort Abu Bakar, Zamri
title Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar
title_short Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar
title_full Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar
title_fullStr Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar
title_full_unstemmed Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar
title_sort enhancement of compound word extraction in malay sentences using modified linguistics approaches / zamri abu bakar
granting_institution Universiti Teknologi MARA (UiTM)
granting_department College of Computing, Informatics and Mathematics
publishDate 2023
url https://ir.uitm.edu.my/id/eprint/88705/1/88705.pdf
_version_ 1794192144125132800