Semantic event extraction in unstructured text based on prominence and discourse-level dependencies

Semantic event extraction has been applied in many natural language processing (NLP) tasks like summarization and text mining. However, not many researches have been carried out to automate multiple event extraction and representation. This has resulted in the limitation of semantically annotated...

Full description

Saved in:

Bibliographic Details
Main Author:	Siaw, Nyuk Hiong
Format:	Thesis
Language:	English
Published:	2015
Subjects:	T Technology (General)
Online Access:	http://ir.unimas.my/id/eprint/10767/1/Siaw%2C%20NH.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-unimas-ir.10767
record_format	uketd_dc
spelling	my-unimas-ir.107672023-08-01T06:40:14Z Semantic event extraction in unstructured text based on prominence and discourse-level dependencies 2015 Siaw, Nyuk Hiong T Technology (General) Semantic event extraction has been applied in many natural language processing (NLP) tasks like summarization and text mining. However, not many researches have been carried out to automate multiple event extraction and representation. This has resulted in the limitation of semantically annotated corpus to PropBank, FrameNet and VerbNet for event extraction. These corpus collections can be expanded by having other semantically annotated event corpus added into it. Many event extraction models like EVENT, SEM and LODE have been proposed but these researches stopped at the collection of events. Extending research beyond this collection of event to investigate the interpretation and abstraction of event-based knowledge has not been exploited much. Furthermore, there is a lack of research for key event indexing to identify the relative importance of multiple events in a complex sentence. This indexing can augment successful extracted event-based knowledge as weight. The main objective of this research is to propose a framework that can automate the extraction of semantically relevant key events based on thematic hierarchy and discourse-level dependencies to determine their relationships and relative importance. This has led to the exploration and formulation of designs to: i) capture and annotate multiple semantic events in a semantic representation format. ii) define a linguistically injected model (Linguistic Window Model) to interpret multiple events in a complex sentence. iii) define new weights for graph-based text (based on Linguistic Window Model) for key event indexing. This research has proposed a new method, EveSem, a NLP tools pipeline to automate the extraction and annotation of semantic events. This tool has performed marginally better than TIPSemB-1.0. EveSem is then extended to invent a Linguistic Window Model which has a linguistic structure that is found to enhance the F1-score when compared to ACE data for event extraction. The thematic hierarchy and discourse-level dependencies properties of the linguistic structure have been found to greatly improve the recall over ACE data for "trigger" identification as well. Based on the thematic hierarchy, new weights are defined to construct weighted graph-based text which has shown to improve the indexing of relative importance of key event in complex sentences. The results showed that the NLP tools pipeline has successfully extracted and represented multiple events in XML tags. The small collection of XML annotated corpus for semantic events can be added to the collection of event lexical databases. Furthermore, this approach is domain generic and is portable to be implemented in other languages provided the language has the available NLP tools. The Linguistic Window Model is able to extract event with improve F1-score over ACE task. This model has the advantage over bag of word (BOW) model for key event indexing since it takes into consideration the context of word co-occurrence and semantic association between words based on the linguistic structure of the model. As a conclusion, the objectives of this research have been successfully achieved. The research has addressed the gaps identified in this thesis by: (a) automatically generated a collection of multiple semantic event using a generic approach through NLP tools as a pipeline, (b) identifying relative importance of key semantic events based on linguistic properties of the sentence. Universiti Malaysia Sarawak, (UNIMAS) 2015 Thesis http://ir.unimas.my/id/eprint/10767/ http://ir.unimas.my/id/eprint/10767/1/Siaw%2C%20NH.pdf text en validuser phd doctoral Universiti Malaysia Sarawak, (UNIMAS) Faculty of Computer Science and Information Technology.
institution	Universiti Malaysia Sarawak
collection	UNIMAS Institutional Repository
language	English
topic	T Technology (General)
spellingShingle	T Technology (General) Siaw, Nyuk Hiong Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
description	Semantic event extraction has been applied in many natural language processing (NLP) tasks like summarization and text mining. However, not many researches have been carried out to automate multiple event extraction and representation. This has resulted in the limitation of semantically annotated corpus to PropBank, FrameNet and VerbNet for event extraction. These corpus collections can be expanded by having other semantically annotated event corpus added into it. Many event extraction models like EVENT, SEM and LODE have been proposed but these researches stopped at the collection of events. Extending research beyond this collection of event to investigate the interpretation and abstraction of event-based knowledge has not been exploited much. Furthermore, there is a lack of research for key event indexing to identify the relative importance of multiple events in a complex sentence. This indexing can augment successful extracted event-based knowledge as weight. The main objective of this research is to propose a framework that can automate the extraction of semantically relevant key events based on thematic hierarchy and discourse-level dependencies to determine their relationships and relative importance. This has led to the exploration and formulation of designs to: i) capture and annotate multiple semantic events in a semantic representation format. ii) define a linguistically injected model (Linguistic Window Model) to interpret multiple events in a complex sentence. iii) define new weights for graph-based text (based on Linguistic Window Model) for key event indexing. This research has proposed a new method, EveSem, a NLP tools pipeline to automate the extraction and annotation of semantic events. This tool has performed marginally better than TIPSemB-1.0. EveSem is then extended to invent a Linguistic Window Model which has a linguistic structure that is found to enhance the F1-score when compared to ACE data for event extraction. The thematic hierarchy and discourse-level dependencies properties of the linguistic structure have been found to greatly improve the recall over ACE data for "trigger" identification as well. Based on the thematic hierarchy, new weights are defined to construct weighted graph-based text which has shown to improve the indexing of relative importance of key event in complex sentences. The results showed that the NLP tools pipeline has successfully extracted and represented multiple events in XML tags. The small collection of XML annotated corpus for semantic events can be added to the collection of event lexical databases. Furthermore, this approach is domain generic and is portable to be implemented in other languages provided the language has the available NLP tools. The Linguistic Window Model is able to extract event with improve F1-score over ACE task. This model has the advantage over bag of word (BOW) model for key event indexing since it takes into consideration the context of word co-occurrence and semantic association between words based on the linguistic structure of the model. As a conclusion, the objectives of this research have been successfully achieved. The research has addressed the gaps identified in this thesis by: (a) automatically generated a collection of multiple semantic event using a generic approach through NLP tools as a pipeline, (b) identifying relative importance of key semantic events based on linguistic properties of the sentence.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Siaw, Nyuk Hiong
author_facet	Siaw, Nyuk Hiong
author_sort	Siaw, Nyuk Hiong
title	Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_short	Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_full	Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_fullStr	Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_full_unstemmed	Semantic event extraction in unstructured text based on prominence and discourse-level dependencies
title_sort	semantic event extraction in unstructured text based on prominence and discourse-level dependencies
granting_institution	Universiti Malaysia Sarawak, (UNIMAS)
granting_department	Faculty of Computer Science and Information Technology.
publishDate	2015
url	http://ir.unimas.my/id/eprint/10767/1/Siaw%2C%20NH.pdf
_version_	1783728072290205696

Semantic event extraction in unstructured text based on prominence and discourse-level dependencies

Similar Items