The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification

Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classificatio...

Full description

Saved in:
Bibliographic Details
Main Author: Nur Syafiqah, Mohd Nafis
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/37676/1/ir.The%20hybrid%20feature%20selection%20technique%20using%20term%20frequency-inverse%20document%20frequency%20and%20support%20vector%20machine-recursive%20feature%20elimination%20for%20sentiment%20classification.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ump-ir.37676
record_format uketd_dc
spelling my-ump-ir.376762023-09-19T01:09:17Z The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification 2022-10 Nur Syafiqah, Mohd Nafis Q Science (General) QA75 Electronic computers. Computer science Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classification. Hybrid feature selection is an efficient technique in sentiment classification. However, there are several disadvantages that can be solved. Firstly, the ability to identify feature importance and reduce some features from opinionated text documents. The failure to address this issue will result in poor classification performance. Therefore, this research aims to improve the classification performances by proposing term frequency-inverse document frequency (TF-IDF) and support vector machine-recursive feature elimination (SVM-RFE) as a hybrid feature selection technique. The TF-IDF evaluates the feature importance, and the standard deviation-based threshold is used for feature reduction. The objective is to improve the conventional approach of reducing features from feature matrix. Later, the SVM-RFE re-evaluates and ranks the remaining features from TF-IDF-based feature matrix. Only the k-top features group from the SVM-RFE ranked features were used for sentiment classification. Finally, the support vector machine (SVM) classifier is employed to classify the English customer review datasets, i.e., opinion-labelled, and large IMDb. The performance was measured using accuracy, precision, recall, F-measure, and feature size reduction. The experimental results present promising performances up to 95.06% in the performance measurements, especially from the large IMDb datasets and additional dataset, hotel review. Consequently, the proposed technique could minimise 31.80% to 64.00% of the features during classification. This reduction rate is significant in optimally utilising the computational resources while preserving the efficiency of the classification performance. 2022-10 Thesis http://umpir.ump.edu.my/id/eprint/37676/ http://umpir.ump.edu.my/id/eprint/37676/1/ir.The%20hybrid%20feature%20selection%20technique%20using%20term%20frequency-inverse%20document%20frequency%20and%20support%20vector%20machine-recursive%20feature%20elimination%20for%20sentiment%20classification.pdf pdf en public phd doctoral Universiti Malaysia Pahang Faculty of Computing Suryanti, Awang
institution Universiti Malaysia Pahang Al-Sultan Abdullah
collection UMPSA Institutional Repository
language English
advisor Suryanti, Awang
topic Q Science (General)
Q Science (General)
spellingShingle Q Science (General)
Q Science (General)
Nur Syafiqah, Mohd Nafis
The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
description Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classification. Hybrid feature selection is an efficient technique in sentiment classification. However, there are several disadvantages that can be solved. Firstly, the ability to identify feature importance and reduce some features from opinionated text documents. The failure to address this issue will result in poor classification performance. Therefore, this research aims to improve the classification performances by proposing term frequency-inverse document frequency (TF-IDF) and support vector machine-recursive feature elimination (SVM-RFE) as a hybrid feature selection technique. The TF-IDF evaluates the feature importance, and the standard deviation-based threshold is used for feature reduction. The objective is to improve the conventional approach of reducing features from feature matrix. Later, the SVM-RFE re-evaluates and ranks the remaining features from TF-IDF-based feature matrix. Only the k-top features group from the SVM-RFE ranked features were used for sentiment classification. Finally, the support vector machine (SVM) classifier is employed to classify the English customer review datasets, i.e., opinion-labelled, and large IMDb. The performance was measured using accuracy, precision, recall, F-measure, and feature size reduction. The experimental results present promising performances up to 95.06% in the performance measurements, especially from the large IMDb datasets and additional dataset, hotel review. Consequently, the proposed technique could minimise 31.80% to 64.00% of the features during classification. This reduction rate is significant in optimally utilising the computational resources while preserving the efficiency of the classification performance.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Nur Syafiqah, Mohd Nafis
author_facet Nur Syafiqah, Mohd Nafis
author_sort Nur Syafiqah, Mohd Nafis
title The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
title_short The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
title_full The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
title_fullStr The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
title_full_unstemmed The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
title_sort hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
granting_institution Universiti Malaysia Pahang
granting_department Faculty of Computing
publishDate 2022
url http://umpir.ump.edu.my/id/eprint/37676/1/ir.The%20hybrid%20feature%20selection%20technique%20using%20term%20frequency-inverse%20document%20frequency%20and%20support%20vector%20machine-recursive%20feature%20elimination%20for%20sentiment%20classification.pdf
_version_ 1783732278795436032