Cross-lingual sentiment classification using semi-supervised learning

Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However,...

Full description

Saved in:

Bibliographic Details
Main Author:	Hajmohammadi, Mohammad Sadegh
Format:	Thesis
Language:	English
Published:	2015
Subjects:	QA75 Electronic computers Computer science
Online Access:	http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utm-ep.77727
record_format	uketd_dc
spelling	my-utm-ep.777272018-06-29T21:45:13Z Cross-lingual sentiment classification using semi-supervised learning 2015-05 Hajmohammadi, Mohammad Sadegh QA75 Electronic computers. Computer science Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems. 2015-05 Thesis http://eprints.utm.my/id/eprint/77727/ http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:97125 phd doctoral Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution	Universiti Teknologi Malaysia
collection	UTM Institutional Repository
language	English
topic	QA75 Electronic computers Computer science
spellingShingle	QA75 Electronic computers Computer science Hajmohammadi, Mohammad Sadegh Cross-lingual sentiment classification using semi-supervised learning
description	Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Hajmohammadi, Mohammad Sadegh
author_facet	Hajmohammadi, Mohammad Sadegh
author_sort	Hajmohammadi, Mohammad Sadegh
title	Cross-lingual sentiment classification using semi-supervised learning
title_short	Cross-lingual sentiment classification using semi-supervised learning
title_full	Cross-lingual sentiment classification using semi-supervised learning
title_fullStr	Cross-lingual sentiment classification using semi-supervised learning
title_full_unstemmed	Cross-lingual sentiment classification using semi-supervised learning
title_sort	cross-lingual sentiment classification using semi-supervised learning
granting_institution	Universiti Teknologi Malaysia, Faculty of Computing
granting_department	Faculty of Computing
publishDate	2015
url	http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf
_version_	1747817816886607872

Cross-lingual sentiment classification using semi-supervised learning

Similar Items