Cyberbullying detection using emotion mining

The expansion of information and communication technologies (ICTs) has led to developments in online communication. Regrettably, such convenience has been abused by online bullies, causing harm to others via threatening, harassing, humiliating, intimidating, manipulating, or controlling targeted vic...

全面介绍

Saved in:
书目详细资料
主要作者: Al-Hashedi, Mohammed Yahea Ali Mahyoub
格式: Thesis
出版: 2022
主题:
标签: 添加标签
没有标签, 成为第一个标记此记录!
id my-mmu-ep.12041
record_format uketd_dc
spelling my-mmu-ep.120412024-01-11T01:55:27Z Cyberbullying detection using emotion mining 2022-10 Al-Hashedi, Mohammed Yahea Ali Mahyoub BF1-990 Psychology The expansion of information and communication technologies (ICTs) has led to developments in online communication. Regrettably, such convenience has been abused by online bullies, causing harm to others via threatening, harassing, humiliating, intimidating, manipulating, or controlling targeted victims. Cyberbullying can have a severe impact on a victim’s mental health, ranging from negative emotions (anger, fear, sadness, guilt, etc.) to depression, and even suicidal thoughts. Due to the potential harmful consequences, cyberbullying detection has become a pressing need in Internet usage governance. The research presented in this thesis is motivated by the fact that negative emotions can be caused by cyberbullying and proposes cyberbullying detection models that are trained based on contextual, emotion, and sentiment features. In this work, all critical steps were taken into consideration, from data preparation to deep learning models. There is a sparsity issue in cyberbullying datasets that encompasses all forms of cyberbullying, such as threatening, harassing, humiliating, intimidating, and manipulating or controlling targeted victims. To address this issue, this research utilized two datasets: the Toxic dataset, collected by the Conversation AI team, and the Twitter dataset. The dataset of cyberbullying generally faces an imbalance between its labels; therefore, sampling techniques were developed to reduce the imbalance ratio. After the datasets preparation, the next step in detecting cyberbullying was extracting textual features, such as syntactic, semantic, contextual, and emotion features. Nevertheless, emotion features were thoroughly investigated through the use of a lexiconbased deep learning model. To build an emotion detection model, the used emotion datasets were collected from twitter through hashtag keywords, and were categorized based on these keywords. Due to the potential inaccuracy of the hashtag labelling, a validation procedure was then carried out to authenticate the annotation of the emotion dataset labels. The validated dataset was then used to train the emotion detection model (EDM) using BERT as a pre-trained word representation model. This model was used to study and explore the emotions related to cyberbullying texts. The results indicate that 92% of cyberbullying emotions are categorized as negative. Emotions and sentiments were drawn out from cyberbullying datasets through the use of EDM and NRC lexicon for emotions and AFINN lexicon for sentiments. These features were fed to deep learning models to train cyberbullying detection models. A set of experiments were carried out to investigate the best set of features for cyberbullying detection. The findings indicate that incorporating emotions features can enhance the precision of detecting cyberbullying as this approach outperformed the use of BERT contextual features only. In the experiment that involved emotion features, the recall score was 0.87, which led to a 0.5 increase in the performance of cyberbullying detection compared to using only BERT. Similarly, incorporating sentiment features improved the model by 0.6 recall compared to only utilizing BERT. 2022-10 Thesis http://shdl.mmu.edu.my/12041/ http://erep.mmu.edu.my/ masters Multimedia University Faculty of Computing and Informatics (FCI) EREP ID: 11743
institution Multimedia University
collection MMU Institutional Repository
topic BF1-990 Psychology
spellingShingle BF1-990 Psychology
Al-Hashedi, Mohammed Yahea Ali Mahyoub
Cyberbullying detection using emotion mining
description The expansion of information and communication technologies (ICTs) has led to developments in online communication. Regrettably, such convenience has been abused by online bullies, causing harm to others via threatening, harassing, humiliating, intimidating, manipulating, or controlling targeted victims. Cyberbullying can have a severe impact on a victim’s mental health, ranging from negative emotions (anger, fear, sadness, guilt, etc.) to depression, and even suicidal thoughts. Due to the potential harmful consequences, cyberbullying detection has become a pressing need in Internet usage governance. The research presented in this thesis is motivated by the fact that negative emotions can be caused by cyberbullying and proposes cyberbullying detection models that are trained based on contextual, emotion, and sentiment features. In this work, all critical steps were taken into consideration, from data preparation to deep learning models. There is a sparsity issue in cyberbullying datasets that encompasses all forms of cyberbullying, such as threatening, harassing, humiliating, intimidating, and manipulating or controlling targeted victims. To address this issue, this research utilized two datasets: the Toxic dataset, collected by the Conversation AI team, and the Twitter dataset. The dataset of cyberbullying generally faces an imbalance between its labels; therefore, sampling techniques were developed to reduce the imbalance ratio. After the datasets preparation, the next step in detecting cyberbullying was extracting textual features, such as syntactic, semantic, contextual, and emotion features. Nevertheless, emotion features were thoroughly investigated through the use of a lexiconbased deep learning model. To build an emotion detection model, the used emotion datasets were collected from twitter through hashtag keywords, and were categorized based on these keywords. Due to the potential inaccuracy of the hashtag labelling, a validation procedure was then carried out to authenticate the annotation of the emotion dataset labels. The validated dataset was then used to train the emotion detection model (EDM) using BERT as a pre-trained word representation model. This model was used to study and explore the emotions related to cyberbullying texts. The results indicate that 92% of cyberbullying emotions are categorized as negative. Emotions and sentiments were drawn out from cyberbullying datasets through the use of EDM and NRC lexicon for emotions and AFINN lexicon for sentiments. These features were fed to deep learning models to train cyberbullying detection models. A set of experiments were carried out to investigate the best set of features for cyberbullying detection. The findings indicate that incorporating emotions features can enhance the precision of detecting cyberbullying as this approach outperformed the use of BERT contextual features only. In the experiment that involved emotion features, the recall score was 0.87, which led to a 0.5 increase in the performance of cyberbullying detection compared to using only BERT. Similarly, incorporating sentiment features improved the model by 0.6 recall compared to only utilizing BERT.
format Thesis
qualification_level Master's degree
author Al-Hashedi, Mohammed Yahea Ali Mahyoub
author_facet Al-Hashedi, Mohammed Yahea Ali Mahyoub
author_sort Al-Hashedi, Mohammed Yahea Ali Mahyoub
title Cyberbullying detection using emotion mining
title_short Cyberbullying detection using emotion mining
title_full Cyberbullying detection using emotion mining
title_fullStr Cyberbullying detection using emotion mining
title_full_unstemmed Cyberbullying detection using emotion mining
title_sort cyberbullying detection using emotion mining
granting_institution Multimedia University
granting_department Faculty of Computing and Informatics (FCI)
publishDate 2022
_version_ 1794019133318234112