Cross-lingual sentiment classification using semi-supervised learning
Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However,...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utm-ep.77727 |
---|---|
record_format |
uketd_dc |
spelling |
my-utm-ep.777272018-06-29T21:45:13Z Cross-lingual sentiment classification using semi-supervised learning 2015-05 Hajmohammadi, Mohammad Sadegh QA75 Electronic computers. Computer science Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems. 2015-05 Thesis http://eprints.utm.my/id/eprint/77727/ http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:97125 phd doctoral Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing |
institution |
Universiti Teknologi Malaysia |
collection |
UTM Institutional Repository |
language |
English |
topic |
QA75 Electronic computers Computer science |
spellingShingle |
QA75 Electronic computers Computer science Hajmohammadi, Mohammad Sadegh Cross-lingual sentiment classification using semi-supervised learning |
description |
Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems. |
format |
Thesis |
qualification_name |
Doctor of Philosophy (PhD.) |
qualification_level |
Doctorate |
author |
Hajmohammadi, Mohammad Sadegh |
author_facet |
Hajmohammadi, Mohammad Sadegh |
author_sort |
Hajmohammadi, Mohammad Sadegh |
title |
Cross-lingual sentiment classification using semi-supervised learning |
title_short |
Cross-lingual sentiment classification using semi-supervised learning |
title_full |
Cross-lingual sentiment classification using semi-supervised learning |
title_fullStr |
Cross-lingual sentiment classification using semi-supervised learning |
title_full_unstemmed |
Cross-lingual sentiment classification using semi-supervised learning |
title_sort |
cross-lingual sentiment classification using semi-supervised learning |
granting_institution |
Universiti Teknologi Malaysia, Faculty of Computing |
granting_department |
Faculty of Computing |
publishDate |
2015 |
url |
http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf |
_version_ |
1747817816886607872 |