Cross-lingual sentiment classification using semi-supervised learning

Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However,...

Full description

Saved in:
Bibliographic Details
Main Author: Hajmohammadi, Mohammad Sadegh
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.77727
record_format uketd_dc
spelling my-utm-ep.777272018-06-29T21:45:13Z Cross-lingual sentiment classification using semi-supervised learning 2015-05 Hajmohammadi, Mohammad Sadegh QA75 Electronic computers. Computer science Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems. 2015-05 Thesis http://eprints.utm.my/id/eprint/77727/ http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:97125 phd doctoral Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Hajmohammadi, Mohammad Sadegh
Cross-lingual sentiment classification using semi-supervised learning
description Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Hajmohammadi, Mohammad Sadegh
author_facet Hajmohammadi, Mohammad Sadegh
author_sort Hajmohammadi, Mohammad Sadegh
title Cross-lingual sentiment classification using semi-supervised learning
title_short Cross-lingual sentiment classification using semi-supervised learning
title_full Cross-lingual sentiment classification using semi-supervised learning
title_fullStr Cross-lingual sentiment classification using semi-supervised learning
title_full_unstemmed Cross-lingual sentiment classification using semi-supervised learning
title_sort cross-lingual sentiment classification using semi-supervised learning
granting_institution Universiti Teknologi Malaysia, Faculty of Computing
granting_department Faculty of Computing
publishDate 2015
url http://eprints.utm.my/id/eprint/77727/1/MohammadSadeghHajmohammadiPFC2015.pdf
_version_ 1747817816886607872