Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data
In this age of information technology, the individuals in social media are generating vast amounts of helpful multilingual (such as English, Bengali, etc.) data. Sentiment analysis is an approach that could help in organizational and individual decision making using those data. However, most of toda...
Saved in:
id |
my-usim-ddms-12959 |
---|---|
record_format |
uketd_dc |
institution |
Universiti Sains Islam Malaysia |
collection |
USIM Institutional Repository |
language |
en_US |
advisor |
Norhidayah Azman [supervisor] |
topic |
Language and languages—Study and teaching English language--Study and teaching |
spellingShingle |
Language and languages—Study and teaching English language--Study and teaching Mohammad Aman Ullah Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data |
description |
In this age of information technology, the individuals in social media are generating vast amounts of helpful multilingual (such as English, Bengali, etc.) data. Sentiment analysis is an approach that could help in organizational and individual decision making using those data. However, most of today’s sentiment analysis is done in a single language, mainly in English thus creating the chance to miss helpful information written in other languages. So, the multilingual sentiment analysis (MLSA) became essential; however, MLSA research faces some problems. Literature review shows existing lexical and knowledge resources are not concept-based, and primarily in English that overlooked the research in a resource-poor language like Bengali. Moreover, the studies that have been done so far have mainly used standard algorithms like Naïve Bayes (NB), Support Vector Machine (SVM), Long Short Term Memory (LSTM) etc., ignoring concept-level MLSA algorithms. Besides, this research has found an insufficient number of student feedback datasets, especially in Bengali. The literature also reveals that preprocessing, feature extraction, and concept extraction techniques need in-depth investigation for sentiment analysis concerning their applications (sole or combination), the effect of using with knowledge bases, algorithms, and datasets of different languages. Therefore, this thesis has contributed a Bengali knowledge base (BanglaSenticNet) of 30,000 concepts and almost 150,000 semantics of those concepts to mitigate the gaps. This research also developed a Bengali polarity lexicon with 72,433 concepts and proposed an algorithm for concept level MLSA (MCSAlgo). Besides, this research has created two English, and one Bengali students feedback dataset using data from social media. The above knowledge base, polarity lexicon, and algorithm are tested using these datasets and validated the results using both baseline datasets (English and Bengali) and standard algorithms (NB, SVM, and LSTM). The research attained a trustworthy result with good classification accuracy using these resources and algorithms. The MCSAlgo is specially applied to test BanglaSenticNet and Bengali polarity lexicon, and its performance is found to be better than NB and SVM in terms of accuracy, recall, precision, and F-score. These lexical and knowledge resources can be considered the most significant resources available for the Bengali sentiment analysis to the best knowledge. This thesis then tested the data sets to find optimal preprocessing, feature, and concept extraction techniques or their combinations and found that applying them in predefined combinations produced better accuracy with NB, SVM, and LSTM, whereas the LSTM classifier outperforms. Moreover, comparative studies with related works show that the experimental results of current research outperform on the scale of accuracy. It is contemplating that this research will provide a noteworthy contribution and facilitate the researchers of this field with some optimal techniques and resources to conclude similar research more quickly, effectively, and efficiently. Future studies may enlarge the knowledge bases and polarity lexicons and be tested with many other classification algorithms and different domains. |
format |
Thesis |
author |
Mohammad Aman Ullah |
author_facet |
Mohammad Aman Ullah |
author_sort |
Mohammad Aman Ullah |
title |
Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data |
title_short |
Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data |
title_full |
Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data |
title_fullStr |
Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data |
title_full_unstemmed |
Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data |
title_sort |
multilingual sentiment analysis from students feedback: optimal techniques and resources for bengali lingual data |
granting_institution |
Universiti Sains Islam Malaysia |
url |
https://oarep.usim.edu.my/bitstreams/391f12e9-0df1-4712-870f-27de8cae8f98/download https://oarep.usim.edu.my/bitstreams/4ef0825d-9b2a-470b-9ad6-d384d5b377db/download https://oarep.usim.edu.my/bitstreams/0c04e594-0bd5-4a62-a42f-17404e745429/download https://oarep.usim.edu.my/bitstreams/f08994a2-bbcd-4734-b0df-af4ddee627f1/download https://oarep.usim.edu.my/bitstreams/89d255f9-59a9-468f-aafa-25a605358c94/download https://oarep.usim.edu.my/bitstreams/5a882c63-7046-4825-a091-7c865d21c107/download https://oarep.usim.edu.my/bitstreams/6d8830c4-3bbb-45ee-afb3-146c3b96473a/download https://oarep.usim.edu.my/bitstreams/90748984-a4c2-4fb0-a819-ec1c65440df6/download https://oarep.usim.edu.my/bitstreams/fecec74d-088b-4cf7-a6b5-a9cac6df28e9/download https://oarep.usim.edu.my/bitstreams/d3ce40e6-2917-4271-a5b7-2b69202f73ca/download |
_version_ |
1812444663710220288 |
spelling |
my-usim-ddms-129592024-08-21T18:01:43Z Multilingual Sentiment Analysis From Students Feedback: Optimal Techniques And Resources For Bengali Lingual Data Mohammad Aman Ullah Norhidayah Azman [supervisor] In this age of information technology, the individuals in social media are generating vast amounts of helpful multilingual (such as English, Bengali, etc.) data. Sentiment analysis is an approach that could help in organizational and individual decision making using those data. However, most of today’s sentiment analysis is done in a single language, mainly in English thus creating the chance to miss helpful information written in other languages. So, the multilingual sentiment analysis (MLSA) became essential; however, MLSA research faces some problems. Literature review shows existing lexical and knowledge resources are not concept-based, and primarily in English that overlooked the research in a resource-poor language like Bengali. Moreover, the studies that have been done so far have mainly used standard algorithms like Naïve Bayes (NB), Support Vector Machine (SVM), Long Short Term Memory (LSTM) etc., ignoring concept-level MLSA algorithms. Besides, this research has found an insufficient number of student feedback datasets, especially in Bengali. The literature also reveals that preprocessing, feature extraction, and concept extraction techniques need in-depth investigation for sentiment analysis concerning their applications (sole or combination), the effect of using with knowledge bases, algorithms, and datasets of different languages. Therefore, this thesis has contributed a Bengali knowledge base (BanglaSenticNet) of 30,000 concepts and almost 150,000 semantics of those concepts to mitigate the gaps. This research also developed a Bengali polarity lexicon with 72,433 concepts and proposed an algorithm for concept level MLSA (MCSAlgo). Besides, this research has created two English, and one Bengali students feedback dataset using data from social media. The above knowledge base, polarity lexicon, and algorithm are tested using these datasets and validated the results using both baseline datasets (English and Bengali) and standard algorithms (NB, SVM, and LSTM). The research attained a trustworthy result with good classification accuracy using these resources and algorithms. The MCSAlgo is specially applied to test BanglaSenticNet and Bengali polarity lexicon, and its performance is found to be better than NB and SVM in terms of accuracy, recall, precision, and F-score. These lexical and knowledge resources can be considered the most significant resources available for the Bengali sentiment analysis to the best knowledge. This thesis then tested the data sets to find optimal preprocessing, feature, and concept extraction techniques or their combinations and found that applying them in predefined combinations produced better accuracy with NB, SVM, and LSTM, whereas the LSTM classifier outperforms. Moreover, comparative studies with related works show that the experimental results of current research outperform on the scale of accuracy. It is contemplating that this research will provide a noteworthy contribution and facilitate the researchers of this field with some optimal techniques and resources to conclude similar research more quickly, effectively, and efficiently. Future studies may enlarge the knowledge bases and polarity lexicons and be tested with many other classification algorithms and different domains. Universiti Sains Islam Malaysia 2021-07 Thesis en_US https://oarep.usim.edu.my/handle/123456789/12959 https://oarep.usim.edu.my/bitstreams/76237be8-dbf6-40b2-a175-60160cde9e40/download 8a4605be74aa9ea9d79846c1fba20a33 https://oarep.usim.edu.my/bitstreams/391f12e9-0df1-4712-870f-27de8cae8f98/download 7447160ad1b3e8da185e289aaecc8e0a https://oarep.usim.edu.my/bitstreams/4ef0825d-9b2a-470b-9ad6-d384d5b377db/download fe4c70c23a2fff5d77679c9320311d9e https://oarep.usim.edu.my/bitstreams/0c04e594-0bd5-4a62-a42f-17404e745429/download 699e10798ba68d7f2d327bf9d874b764 https://oarep.usim.edu.my/bitstreams/f08994a2-bbcd-4734-b0df-af4ddee627f1/download 4b3f1f17e5cd6f893a77d0d13f67efca https://oarep.usim.edu.my/bitstreams/89d255f9-59a9-468f-aafa-25a605358c94/download 598c2e44ea649ad230baa3c427cc96f8 https://oarep.usim.edu.my/bitstreams/5a882c63-7046-4825-a091-7c865d21c107/download 01e4da51517685bc725173842198afe7 https://oarep.usim.edu.my/bitstreams/6d8830c4-3bbb-45ee-afb3-146c3b96473a/download 99489b8fc96407126cd6e7464dc3f1e4 https://oarep.usim.edu.my/bitstreams/90748984-a4c2-4fb0-a819-ec1c65440df6/download 81814f678e312a64f395f36a7f33ba1a https://oarep.usim.edu.my/bitstreams/fecec74d-088b-4cf7-a6b5-a9cac6df28e9/download 3bf5b7cb4580e6c5f66087eb44419ba1 https://oarep.usim.edu.my/bitstreams/d3ce40e6-2917-4271-a5b7-2b69202f73ca/download 82e7a66535465d2754954eb96430c5bf https://oarep.usim.edu.my/bitstreams/b9a8449d-b046-474e-8619-913531083589/download f40633c94c825743b84699869457d6bf https://oarep.usim.edu.my/bitstreams/3885c0a3-f5a5-43bf-85cb-6171402f7dba/download fa0f9baa6f6ddcb02633cc2d890ac309 https://oarep.usim.edu.my/bitstreams/02b4b326-cf09-498a-96a6-b8d19045552b/download b75d096fa9b5746f4780556569d39775 https://oarep.usim.edu.my/bitstreams/75e2e20f-0e43-480b-84e7-89f8b63001be/download b094ef5fe68cb7588d5fe3e1165125ba https://oarep.usim.edu.my/bitstreams/c59b01b4-e22c-4524-bc12-b688c8c16104/download 2bfb6668fd6647068b9c0e1af8fa7777 https://oarep.usim.edu.my/bitstreams/8ecc229d-4534-492d-9434-2012223a6a6f/download 696d759af2450b8a98b51dfeee6d2de8 https://oarep.usim.edu.my/bitstreams/a00ec64c-487a-42a8-9f4d-2a2ca9e35d16/download 79376bd100e19430b1275a27426cd480 https://oarep.usim.edu.my/bitstreams/c570829c-bcff-42d8-b306-a8fdf1c410f7/download 88e9adcff8049bf5919b082d38a9a3e6 https://oarep.usim.edu.my/bitstreams/789d1e71-48f8-4f2e-87e8-476b5a9c5509/download 5df9b1e78e96042d309d4111bc8726c4 https://oarep.usim.edu.my/bitstreams/fbaf1c97-2189-4f96-8fb4-266d740aa826/download 7a0882703f2419d641f6047fee823294 Language and languages—Study and teaching English language--Study and teaching |