Enhanced lexicon based models for extracting question-answer pairs from web forum

A Web forum is an online community that brings people in different geographical locations together. Members of the forum exchange ideas and expertise. As a result, a huge amount of contents on different topics are generated on a daily basis. The huge human generated contents of web forum can be mine...

Full description

Saved in:
Bibliographic Details
Main Author: Obasa, Adekunle Isiaka
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://eprints.utm.my/id/eprint/79067/1/AdekunleIsiakaObasaPFC2016.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.79067
record_format uketd_dc
spelling my-utm-ep.790672018-09-27T06:07:14Z Enhanced lexicon based models for extracting question-answer pairs from web forum 2016 Obasa, Adekunle Isiaka QA75 Electronic computers. Computer science A Web forum is an online community that brings people in different geographical locations together. Members of the forum exchange ideas and expertise. As a result, a huge amount of contents on different topics are generated on a daily basis. The huge human generated contents of web forum can be mined as questionanswer pairs (Q&A). One of the major challenges in mining Q&A from web forum is to establish a good relationship between the question and the candidate answers. This problem is compounded by the noisy nature of web forum's human generated contents. Unfortunately, the existing methods that are used to mine knowledge from web forums ignore the effect of noise on the mining tools, making the lexical contents less effective. This study proposes lexicon based models that can automatically mine question-answer pairs with higher accuracy scores from web forum. The first phase of the research produces question mining model. It was implemented using features generated from unigram, bigram, forum metadata and simple rules. These features were screened using both chi-square and wrapper techniques. Wrapper generated features were used by Multinomial Naïve Bayes to finally build the model. The second phase produced a normalized lexical model for answer mining. It was implemented using 13 lexical features that cut across four quality dimensions. The performance of the features was enhanced by noise normalization, a process that fixed orthographic, phonetic and acronyms noises. The third phase of the research produced a hybridized model of lexical and non-lexical features. The average performances of the question mining model, normalized lexical model and hybridized model for answer mining were 90.3%, 97.5%, and 99.5% respectively on three data sets used. They outperformed all previous works in the domain. The first major contribution of the study is the development of an improved question mining model that is characterized by higher accuracy, better specificity, less complex and ability to generate good accuracy across different forum genres. The second contribution is the development of normalized lexical based model that has capability to establish good relationship between a question and its corresponding answer. The third contribution is the development of a hybridized model that integrates lexical features that guarantee relevance with non-lexical that guarantee quality to mine web forum answers. The fourth contribution is a novel integration of question and answer mining models to automatically generate question-answer pairs from web forum. 2016 Thesis http://eprints.utm.my/id/eprint/79067/ http://eprints.utm.my/id/eprint/79067/1/AdekunleIsiakaObasaPFC2016.pdf application/pdf en public phd doctoral Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Obasa, Adekunle Isiaka
Enhanced lexicon based models for extracting question-answer pairs from web forum
description A Web forum is an online community that brings people in different geographical locations together. Members of the forum exchange ideas and expertise. As a result, a huge amount of contents on different topics are generated on a daily basis. The huge human generated contents of web forum can be mined as questionanswer pairs (Q&A). One of the major challenges in mining Q&A from web forum is to establish a good relationship between the question and the candidate answers. This problem is compounded by the noisy nature of web forum's human generated contents. Unfortunately, the existing methods that are used to mine knowledge from web forums ignore the effect of noise on the mining tools, making the lexical contents less effective. This study proposes lexicon based models that can automatically mine question-answer pairs with higher accuracy scores from web forum. The first phase of the research produces question mining model. It was implemented using features generated from unigram, bigram, forum metadata and simple rules. These features were screened using both chi-square and wrapper techniques. Wrapper generated features were used by Multinomial Naïve Bayes to finally build the model. The second phase produced a normalized lexical model for answer mining. It was implemented using 13 lexical features that cut across four quality dimensions. The performance of the features was enhanced by noise normalization, a process that fixed orthographic, phonetic and acronyms noises. The third phase of the research produced a hybridized model of lexical and non-lexical features. The average performances of the question mining model, normalized lexical model and hybridized model for answer mining were 90.3%, 97.5%, and 99.5% respectively on three data sets used. They outperformed all previous works in the domain. The first major contribution of the study is the development of an improved question mining model that is characterized by higher accuracy, better specificity, less complex and ability to generate good accuracy across different forum genres. The second contribution is the development of normalized lexical based model that has capability to establish good relationship between a question and its corresponding answer. The third contribution is the development of a hybridized model that integrates lexical features that guarantee relevance with non-lexical that guarantee quality to mine web forum answers. The fourth contribution is a novel integration of question and answer mining models to automatically generate question-answer pairs from web forum.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Obasa, Adekunle Isiaka
author_facet Obasa, Adekunle Isiaka
author_sort Obasa, Adekunle Isiaka
title Enhanced lexicon based models for extracting question-answer pairs from web forum
title_short Enhanced lexicon based models for extracting question-answer pairs from web forum
title_full Enhanced lexicon based models for extracting question-answer pairs from web forum
title_fullStr Enhanced lexicon based models for extracting question-answer pairs from web forum
title_full_unstemmed Enhanced lexicon based models for extracting question-answer pairs from web forum
title_sort enhanced lexicon based models for extracting question-answer pairs from web forum
granting_institution Universiti Teknologi Malaysia, Faculty of Computing
granting_department Faculty of Computing
publishDate 2016
url http://eprints.utm.my/id/eprint/79067/1/AdekunleIsiakaObasaPFC2016.pdf
_version_ 1747818138949386240