Cyberbullying detection using emotion mining

The expansion of information and communication technologies (ICTs) has led to developments in online communication. Regrettably, such convenience has been abused by online bullies, causing harm to others via threatening, harassing, humiliating, intimidating, manipulating, or controlling targeted vic...

Full description

Saved in:
Bibliographic Details
Main Author: Al-Hashedi, Mohammed Yahea Ali Mahyoub
Format: Thesis
Published: 2022
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The expansion of information and communication technologies (ICTs) has led to developments in online communication. Regrettably, such convenience has been abused by online bullies, causing harm to others via threatening, harassing, humiliating, intimidating, manipulating, or controlling targeted victims. Cyberbullying can have a severe impact on a victim’s mental health, ranging from negative emotions (anger, fear, sadness, guilt, etc.) to depression, and even suicidal thoughts. Due to the potential harmful consequences, cyberbullying detection has become a pressing need in Internet usage governance. The research presented in this thesis is motivated by the fact that negative emotions can be caused by cyberbullying and proposes cyberbullying detection models that are trained based on contextual, emotion, and sentiment features. In this work, all critical steps were taken into consideration, from data preparation to deep learning models. There is a sparsity issue in cyberbullying datasets that encompasses all forms of cyberbullying, such as threatening, harassing, humiliating, intimidating, and manipulating or controlling targeted victims. To address this issue, this research utilized two datasets: the Toxic dataset, collected by the Conversation AI team, and the Twitter dataset. The dataset of cyberbullying generally faces an imbalance between its labels; therefore, sampling techniques were developed to reduce the imbalance ratio. After the datasets preparation, the next step in detecting cyberbullying was extracting textual features, such as syntactic, semantic, contextual, and emotion features. Nevertheless, emotion features were thoroughly investigated through the use of a lexiconbased deep learning model. To build an emotion detection model, the used emotion datasets were collected from twitter through hashtag keywords, and were categorized based on these keywords. Due to the potential inaccuracy of the hashtag labelling, a validation procedure was then carried out to authenticate the annotation of the emotion dataset labels. The validated dataset was then used to train the emotion detection model (EDM) using BERT as a pre-trained word representation model. This model was used to study and explore the emotions related to cyberbullying texts. The results indicate that 92% of cyberbullying emotions are categorized as negative. Emotions and sentiments were drawn out from cyberbullying datasets through the use of EDM and NRC lexicon for emotions and AFINN lexicon for sentiments. These features were fed to deep learning models to train cyberbullying detection models. A set of experiments were carried out to investigate the best set of features for cyberbullying detection. The findings indicate that incorporating emotions features can enhance the precision of detecting cyberbullying as this approach outperformed the use of BERT contextual features only. In the experiment that involved emotion features, the recall score was 0.87, which led to a 0.5 increase in the performance of cyberbullying detection compared to using only BERT. Similarly, incorporating sentiment features improved the model by 0.6 recall compared to only utilizing BERT.