Stock market classification model using sentiment analysis based on hybrid naive bayes classifiers

Sentiment analysis has become one of the most common method to classify stock market behaviour. Moreover, sentiment analysis has gained a lot of importance in the last decade especially due to the availability of data from social media such as Twitter. However, the accuracy of stock market classific...

Full description

Saved in:
Bibliographic Details
Main Author: A. Jabbar Alkubaisi, Ghaith Abdulsattar
Format: Thesis
Language:eng
eng
Published: 2019
Subjects:
Online Access:https://etd.uum.edu.my/8123/1/s900600_01.pdf
https://etd.uum.edu.my/8123/2/s900600_02.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sentiment analysis has become one of the most common method to classify stock market behaviour. Moreover, sentiment analysis has gained a lot of importance in the last decade especially due to the availability of data from social media such as Twitter. However, the accuracy of stock market classification models is still low, and this has negatively affected the stock market indicators. Furthermore, there are many factors that have a direct effect on the classification models’ accuracies which were not addressed by previous research. One of the factors is the exclusion of spatial-temporal features. Another important factor is the automatic labelling technique which leads to low classification accuracy due to the absence of specific lexicon. The appropriateness of the classifiers to the data features and domain is also another factor, which affect the classification accuracy. In this research, a model for stock market classification based on sentiment analysis is constructed. It is designed to enhance the classification accuracy by the incorporation of tweet timestamp and location features, stock market domain expert labelling technique and the construction of a hybrid Naïve Bayes classifiers to classify the stock market sentiments. The methodology for this research consists of six phases. The first phase is data collection, and the second phase represents the most important phase, which is labelling, in which polarity of data is specified as negative, positive or neutral values. The third phase involves data pre-processing, which is conducted to get only relevant features. The fourth phase is classification in which suitable patterns of the stock market are identified by hybridizing different Naïve Bayes classifiers. The fifth phase is performance and evaluation, and the final phase is recognition for the stock market behaviour. The model produced a significant result in classifying stock market behaviour with accuracy more than 89%. The model is beneficial for investors and researchers. For investors, it enables them to formulate their plans based on accurate indicators whereby it reduces the risk in decision making. For researchers, it draws their attention to the importance of feature engineering, labelling technique, and the classifiers hybridization in enhancing the classification accuracy.