Semantic event extraction in unstructured text based on prominence and discourse-level dependencies

Semantic event extraction has been applied in many natural language processing (NLP) tasks like summarization and text mining. However, not many researches have been carried out to automate multiple event extraction and representation. This has resulted in the limitation of semantically annotated...

Full description

Saved in:
Bibliographic Details
Main Author: Siaw, Nyuk Hiong
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://ir.unimas.my/id/eprint/10767/1/Siaw%2C%20NH.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-unimas-ir.10767
record_format uketd_dc
spelling my-unimas-ir.107672023-08-01T06:40:14Z Semantic event extraction in unstructured text based on prominence and discourse-level dependencies 2015 Siaw, Nyuk Hiong T Technology (General) Semantic event extraction has been applied in many natural language processing (NLP) tasks like summarization and text mining. However, not many researches have been carried out to automate multiple event extraction and representation. This has resulted in the limitation of semantically annotated corpus to PropBank, FrameNet and VerbNet for event extraction. These corpus collections can be expanded by having other semantically annotated event corpus added into it. Many event extraction models like EVENT, SEM and LODE have been proposed but these researches stopped at the collection of events. Extending research beyond this collection of event to investigate the interpretation and abstraction of event-based knowledge has not been exploited much. Furthermore, there is a lack of research for key event indexing to identify the relative importance of multiple events in a complex sentence. This indexing can augment successful extracted event-based knowledge as weight. The main objective of this research is to propose a framework that can automate the extraction of semantically relevant key events based on thematic hierarchy and discourse-level dependencies to determine their relationships and relative importance. This has led to the exploration and formulation of designs to: i) capture and annotate multiple semantic events in a semantic representation format. ii) define a linguistically injected model (Linguistic Window Model) to interpret multiple events in a complex sentence. iii) define new weights for graph-based text (based on Linguistic Window Model) for key event indexing. This research has proposed a new method, EveSem, a NLP tools pipeline to automate the extraction and annotation of semantic events. This tool has performed marginally better than TIPSemB-1.0. EveSem is then extended to invent a Linguistic Window Model which has a linguistic structure that is found to enhance the F1-score when compared to ACE data for event extraction. The thematic hierarchy and discourse-level dependencies properties of the linguistic structure have been found to greatly improve the recall over ACE data for "trigger" identification as well. Based on the thematic hierarchy, new weights are defined to construct weighted graph-based text which has shown to improve the indexing of relative importance of key event in complex sentences. The results showed that the NLP tools pipeline has successfully extracted and represented multiple events in XML tags. The small collection of XML annotated corpus for semantic events can be added to the collection of event lexical databases. Furthermore, this approach is domain generic and is portable to be implemented in other languages provided the language has the available NLP tools. The Linguistic Window Model is able to extract event with improve F1-score over ACE task. This model has the advantage over bag of word (BOW) model for key event indexing since it takes into consideration the context of word co-occurrence and semantic association between words based on the linguistic structure of the model. As a conclusion, the objectives of this research have been successfully achieved. The research has addressed the gaps identified in this thesis by: (a) automatically generated a collection of multiple semantic event using a generic approach through NLP tools as a pipeline, (b) identifying relative importance of key semantic events based on linguistic properties of the sentence. Universiti Malaysia Sarawak, (UNIMAS) 2015 Thesis http://ir.unimas.my/id/eprint/10767/ http://ir.unimas.my/id/eprint/10767/1/Siaw%2C%20NH.pdf text en validuser phd doctoral Universiti Malaysia Sarawak, (UNIMAS) Faculty of Computer Science and Information Technology.
institution Universiti Malaysia Sarawak
collection UNIMAS Institutional Repository
language English
topic T Technology (General)
spellingShingle T Technology (General)
Siaw, Nyuk Hiong
Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
description Semantic event extraction has been applied in many natural language processing (NLP) tasks like summarization and text mining. However, not many researches have been carried out to automate multiple event extraction and representation. This has resulted in the limitation of semantically annotated corpus to PropBank, FrameNet and VerbNet for event extraction. These corpus collections can be expanded by having other semantically annotated event corpus added into it. Many event extraction models like EVENT, SEM and LODE have been proposed but these researches stopped at the collection of events. Extending research beyond this collection of event to investigate the interpretation and abstraction of event-based knowledge has not been exploited much. Furthermore, there is a lack of research for key event indexing to identify the relative importance of multiple events in a complex sentence. This indexing can augment successful extracted event-based knowledge as weight. The main objective of this research is to propose a framework that can automate the extraction of semantically relevant key events based on thematic hierarchy and discourse-level dependencies to determine their relationships and relative importance. This has led to the exploration and formulation of designs to: i) capture and annotate multiple semantic events in a semantic representation format. ii) define a linguistically injected model (Linguistic Window Model) to interpret multiple events in a complex sentence. iii) define new weights for graph-based text (based on Linguistic Window Model) for key event indexing. This research has proposed a new method, EveSem, a NLP tools pipeline to automate the extraction and annotation of semantic events. This tool has performed marginally better than TIPSemB-1.0. EveSem is then extended to invent a Linguistic Window Model which has a linguistic structure that is found to enhance the F1-score when compared to ACE data for event extraction. The thematic hierarchy and discourse-level dependencies properties of the linguistic structure have been found to greatly improve the recall over ACE data for "trigger" identification as well. Based on the thematic hierarchy, new weights are defined to construct weighted graph-based text which has shown to improve the indexing of relative importance of key event in complex sentences. The results showed that the NLP tools pipeline has successfully extracted and represented multiple events in XML tags. The small collection of XML annotated corpus for semantic events can be added to the collection of event lexical databases. Furthermore, this approach is domain generic and is portable to be implemented in other languages provided the language has the available NLP tools. The Linguistic Window Model is able to extract event with improve F1-score over ACE task. This model has the advantage over bag of word (BOW) model for key event indexing since it takes into consideration the context of word co-occurrence and semantic association between words based on the linguistic structure of the model. As a conclusion, the objectives of this research have been successfully achieved. The research has addressed the gaps identified in this thesis by: (a) automatically generated a collection of multiple semantic event using a generic approach through NLP tools as a pipeline, (b) identifying relative importance of key semantic events based on linguistic properties of the sentence.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Siaw, Nyuk Hiong
author_facet Siaw, Nyuk Hiong
author_sort Siaw, Nyuk Hiong
title Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_short Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_full Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_fullStr Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_full_unstemmed Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_sort semantic event extraction in unstructured text based on prominence and discourse-level dependencies
granting_institution Universiti Malaysia Sarawak, (UNIMAS)
granting_department Faculty of Computer Science and Information Technology.
publishDate 2015
url http://ir.unimas.my/id/eprint/10767/1/Siaw%2C%20NH.pdf
_version_ 1783728072290205696