Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data
In this age of information technology, the individuals in social media are generating vast amounts of helpful multilingual (such as English, Bengali, etc.) data. Sentiment analysis is an approach that could help in organizational and individual decision making using those data. However, most of toda...
Saved in:
Summary: | In this age of information technology, the individuals in social media are generating vast amounts of helpful multilingual (such as English, Bengali, etc.) data. Sentiment analysis is an approach that could help in organizational and individual decision making using those data. However, most of today’s sentiment analysis is done in a single language, mainly in English thus creating the chance to miss helpful information written in other languages. So, the multilingual sentiment analysis (MLSA) became essential; however, MLSA research faces some problems. Literature review shows existing lexical and knowledge resources are not concept-based, and primarily in English that overlooked the research in a resource-poor language like Bengali. Moreover, the studies that have been done so far have mainly used standard algorithms like Naïve Bayes (NB), Support Vector Machine (SVM), Long Short Term Memory (LSTM) etc., ignoring concept-level MLSA algorithms. Besides, this research has found an insufficient number of student feedback datasets, especially in Bengali. The literature also reveals that preprocessing, feature extraction, and concept extraction techniques need in-depth investigation for sentiment analysis concerning their applications (sole or combination), the effect of using with knowledge bases, algorithms, and datasets of different languages. Therefore, this thesis has contributed a Bengali knowledge base (BanglaSenticNet) of 30,000 concepts and almost 150,000 semantics of those concepts to mitigate the gaps. This research also developed a Bengali polarity lexicon with 72,433 concepts and proposed an algorithm for concept level MLSA (MCSAlgo). Besides, this research has created two English, and one Bengali students feedback dataset using data from social media. The above knowledge base, polarity lexicon, and algorithm are tested using these datasets and validated the results using both baseline datasets (English and Bengali) and standard algorithms (NB, SVM, and LSTM). The research attained a trustworthy result with good classification accuracy using these resources and algorithms. The MCSAlgo is specially applied to test BanglaSenticNet and Bengali polarity lexicon, and its performance is found to be better than NB and SVM in terms of accuracy, recall, precision, and F-score. These lexical and knowledge resources can be considered the most significant resources available for the Bengali sentiment analysis to the best knowledge. This thesis then tested the data sets to find optimal preprocessing, feature, and concept extraction techniques or their combinations and found that applying them in predefined combinations produced better accuracy with NB, SVM, and LSTM, whereas the LSTM classifier outperforms. Moreover, comparative studies with related works show that the experimental results of current research outperform on the scale of accuracy. It is contemplating that this research will provide a noteworthy contribution and facilitate the researchers of this field with some optimal techniques and resources to conclude similar research more quickly, effectively, and efficiently. Future studies may enlarge the knowledge bases and polarity lexicons and be tested with many other classification algorithms and different domains. |
---|