Morphological System For Under-Resourced Languages Using Hybrid Approach

Computational morphology covers the automatic analysis (recognition of the internal structure) and generation (formation of a word) of words. As such, it is an ineluctable step in many Natural Language Processing (NLP) applications. Over the last thirty years, the computational morphology area has b...

Full description

Saved in:
Bibliographic Details
Main Author: Saee, Suhaila
Format: Thesis
Published: 2016
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Computational morphology covers the automatic analysis (recognition of the internal structure) and generation (formation of a word) of words. As such, it is an ineluctable step in many Natural Language Processing (NLP) applications. Over the last thirty years, the computational morphology area has been dominated by the finite state approach. This approach makes use of finite state transducer as its internal representation to describe the morphological information at the lexical and surface levels. Finite state morphology has been claimed as a language-independent component and is capable of handling all complex languages such as Finnish, Turkish, and Arabic, including under-resourced languages (U-RL). However, its requirements for a large amount of linguistic resources, which are morphological rules, lexicon, and language experts, lead to its limitations. Indeed, these limitations have resulted in the issues of the U-RL in computational morphology. The issues comprise the morphological data acquisition, language morphology representation, and general rule formalism. Hence, a morphological system that would be able to overcome these issues is needed especially when dealing with U-RL. In this research, there are two main issues to be highlighted: i) a workflow of the morphological system that can be used with the U-RL and ii) the internal representation of morphological information that complies with the selected framework, that is the Structured String Tree Correspondence (SSTC). The aim of this research is to propose a new Structured String Tree Correpondence+Morphology (SSTC+M) framework for constructing a morphological system for U-RL. Three primary levels are designed in the proposed framework, namely, morphological data acquisition (level 1), morphology theory adaptation (level 2), and computational morphology adaptation (level 3). The output of each level will be the input to the next level.