Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models
Emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus is labour-intensive and time-consuming. Distant supervision can be used to collect large amount of training data in a short period of time using emotion word hashtags, but the collecte...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | http://eprints.usm.my/59117/1/YONG%20KUAN%20SHYANG%20-%20TESIS.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-usm-ep.59117 |
---|---|
record_format |
uketd_dc |
spelling |
my-usm-ep.591172023-08-14T06:38:11Z Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models 2022-08 Yong, Kuan Shyang QA76.6 Electronic digital computers -- Programming Emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus is labour-intensive and time-consuming. Distant supervision can be used to collect large amount of training data in a short period of time using emotion word hashtags, but the collected data may contain excessive noise. In this research, we proposed a text augmentation strategy to efficiently expand the size of positive examples for six emotion categories (happiness, anger, excitement, desperation, boredom and indifference) in EmoTweet-28 by exploiting tweets collected from distant supervision (DS) that are similar to the seed examples in EmoTweet-28 (ET-seed). Similarity scoring approach was used to compute to cosine similarity scores between each DS tweet and all ET-seed tweets under the same emotion category. Seven vector representations (USE, InferSent GloVe, InferSent fastText, Word2Vec, fastText, GloVe, and Bag-of-Words) were experimented to represent the tweets in the similarity scoring approach. DS tweets with high similarity scores were selected to become the augmented instances and annotated with emotion labels. The selection of DS tweets was divided into two categories which are threshold-based selection and fixed increment selection. In addition, we also modified the proposed text augmentation strategy by altering the seed sets used for similarity scoring using clustering and misclassified strategies. All augmented sets were evaluated by training a deep neural network classifier separately to distinguish between the presence or absence of specific emotion in tweets from the test set. 2022-08 Thesis http://eprints.usm.my/59117/ http://eprints.usm.my/59117/1/YONG%20KUAN%20SHYANG%20-%20TESIS.pdf application/pdf en public masters Universiti Sains Malaysia Pusat Pengajian Sains Komputer |
institution |
Universiti Sains Malaysia |
collection |
USM Institutional Repository |
language |
English |
topic |
QA76.6 Electronic digital computers -- Programming |
spellingShingle |
QA76.6 Electronic digital computers -- Programming Yong, Kuan Shyang Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models |
description |
Emotion classification can benefit from a larger pool of training data but
manually expanding the emotion corpus is labour-intensive and time-consuming.
Distant supervision can be used to collect large amount of training data in a short
period of time using emotion word hashtags, but the collected data may contain
excessive noise. In this research, we proposed a text augmentation strategy to
efficiently expand the size of positive examples for six emotion categories (happiness,
anger, excitement, desperation, boredom and indifference) in EmoTweet-28 by
exploiting tweets collected from distant supervision (DS) that are similar to the seed
examples in EmoTweet-28 (ET-seed). Similarity scoring approach was used to
compute to cosine similarity scores between each DS tweet and all ET-seed tweets
under the same emotion category. Seven vector representations (USE, InferSent
GloVe, InferSent fastText, Word2Vec, fastText, GloVe, and Bag-of-Words) were
experimented to represent the tweets in the similarity scoring approach. DS tweets with
high similarity scores were selected to become the augmented instances and annotated
with emotion labels. The selection of DS tweets was divided into two categories which
are threshold-based selection and fixed increment selection. In addition, we also
modified the proposed text augmentation strategy by altering the seed sets used for
similarity scoring using clustering and misclassified strategies. All augmented sets
were evaluated by training a deep neural network classifier separately to distinguish
between the presence or absence of specific emotion in tweets from the test set. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Yong, Kuan Shyang |
author_facet |
Yong, Kuan Shyang |
author_sort |
Yong, Kuan Shyang |
title |
Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models |
title_short |
Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models |
title_full |
Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models |
title_fullStr |
Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models |
title_full_unstemmed |
Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models |
title_sort |
text augmentation for emotion classification in microblog text using similarity scoring based on neural embedding models |
granting_institution |
Universiti Sains Malaysia |
granting_department |
Pusat Pengajian Sains Komputer |
publishDate |
2022 |
url |
http://eprints.usm.my/59117/1/YONG%20KUAN%20SHYANG%20-%20TESIS.pdf |
_version_ |
1776101249607794688 |