Embedded feature selection methods with high dimensionality for elastic net and logistic regression models

Feature selection and classification in high-dimensional data is a challenging problem in scientific research such as biology, medicine, and finance. In such data, highly correlated features and missing data often exist. Therefore, selecting informative features and adequate handling of missing valu...

Full description

Saved in:

Bibliographic Details
Main Author:	Alharthi, Aiedh Mrisi
Format:	Thesis
Language:	English
Published:	2022
Subjects:	QA Mathematics
Online Access:	http://eprints.utm.my/id/eprint/102313/1/AiedhMrisiAlharthiPFS2022.pdf.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utm-ep.102313
record_format	uketd_dc
spelling	my-utm-ep.1023132023-08-17T01:08:11Z Embedded feature selection methods with high dimensionality for elastic net and logistic regression models 2022 Alharthi, Aiedh Mrisi QA Mathematics Feature selection and classification in high-dimensional data is a challenging problem in scientific research such as biology, medicine, and finance. In such data, highly correlated features and missing data often exist. Therefore, selecting informative features and adequate handling of missing values are significant to find an optimal model in terms of interpretability and prediction accuracy. In recent years, embedded feature selection methods, including penalized regression, have attracted many statisticians since these methods often obtain model estimates with higher prediction accuracy. Nevertheless, most penalized methods lack the consistency of feature selection, encouragement of grouping effects, and handling missing values when dealing with high-dimensional data. Hence, this study aims to improve the process of feature selection and handling of missing values by proposing several improvements in the penalized high-dimensional approaches. An alternative initial weight was introduced in the adaptive least absolute shrinkage and selection operator (LASSO) to improve the feature selection performance. Then, an initial ratio and adjusted variance weights inside the ??1-norm penalty of the adaptive elastic net are proposed to encourage the grouping effect. Furthermore, imputation penalized logistic regression with the adaptive LASSO approach was proposed to enhance the handling of missing values in high-dimensional data. Simulation studies with varying numbers of predictor variables, sample sizes, correlation coefficients, and the proportion of missing values were performed to evaluate the effectiveness of the proposed methods. The proposed adaptive LASSO methods were also compared with LASSO and other versions of adaptive LASSO methods, while the proposed adaptive elastic net methods were compared with the existing elastic net and adaptive elastic net methods. The proposed methods were also applied to a chemometrics dataset and eight gene expression microarray datasets in which the number of genes (features) is more than the sample size. The results indicated that the proposed methods outperform their competitors in selecting the most relevant features and achieving higher classification accuracy, sensitivity, and specificity values. It also reduces dimensionality and selects the most helpful features for cancer classification, resulting in optimal models that concurrently perform feature selection and patient classification. On the other hand, the proposed adaptive elastic net method is shown superior to the other methods in terms of encouraging the group effect. In conclusion, this study shows that the proposed methods are appropriate for gene expression data classification and other high-dimensional data classification analyses. 2022 Thesis http://eprints.utm.my/id/eprint/102313/ http://eprints.utm.my/id/eprint/102313/1/AiedhMrisiAlharthiPFS2022.pdf.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149202 phd doctoral Universiti Teknologi Malaysia Faculty of Science
institution	Universiti Teknologi Malaysia
collection	UTM Institutional Repository
language	English
topic	QA Mathematics
spellingShingle	QA Mathematics Alharthi, Aiedh Mrisi Embedded feature selection methods with high dimensionality for elastic net and logistic regression models
description	Feature selection and classification in high-dimensional data is a challenging problem in scientific research such as biology, medicine, and finance. In such data, highly correlated features and missing data often exist. Therefore, selecting informative features and adequate handling of missing values are significant to find an optimal model in terms of interpretability and prediction accuracy. In recent years, embedded feature selection methods, including penalized regression, have attracted many statisticians since these methods often obtain model estimates with higher prediction accuracy. Nevertheless, most penalized methods lack the consistency of feature selection, encouragement of grouping effects, and handling missing values when dealing with high-dimensional data. Hence, this study aims to improve the process of feature selection and handling of missing values by proposing several improvements in the penalized high-dimensional approaches. An alternative initial weight was introduced in the adaptive least absolute shrinkage and selection operator (LASSO) to improve the feature selection performance. Then, an initial ratio and adjusted variance weights inside the ??1-norm penalty of the adaptive elastic net are proposed to encourage the grouping effect. Furthermore, imputation penalized logistic regression with the adaptive LASSO approach was proposed to enhance the handling of missing values in high-dimensional data. Simulation studies with varying numbers of predictor variables, sample sizes, correlation coefficients, and the proportion of missing values were performed to evaluate the effectiveness of the proposed methods. The proposed adaptive LASSO methods were also compared with LASSO and other versions of adaptive LASSO methods, while the proposed adaptive elastic net methods were compared with the existing elastic net and adaptive elastic net methods. The proposed methods were also applied to a chemometrics dataset and eight gene expression microarray datasets in which the number of genes (features) is more than the sample size. The results indicated that the proposed methods outperform their competitors in selecting the most relevant features and achieving higher classification accuracy, sensitivity, and specificity values. It also reduces dimensionality and selects the most helpful features for cancer classification, resulting in optimal models that concurrently perform feature selection and patient classification. On the other hand, the proposed adaptive elastic net method is shown superior to the other methods in terms of encouraging the group effect. In conclusion, this study shows that the proposed methods are appropriate for gene expression data classification and other high-dimensional data classification analyses.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Alharthi, Aiedh Mrisi
author_facet	Alharthi, Aiedh Mrisi
author_sort	Alharthi, Aiedh Mrisi
title	Embedded feature selection methods with high dimensionality for elastic net and logistic regression models
title_short	Embedded feature selection methods with high dimensionality for elastic net and logistic regression models
title_full	Embedded feature selection methods with high dimensionality for elastic net and logistic regression models
title_fullStr	Embedded feature selection methods with high dimensionality for elastic net and logistic regression models
title_full_unstemmed	Embedded feature selection methods with high dimensionality for elastic net and logistic regression models
title_sort	embedded feature selection methods with high dimensionality for elastic net and logistic regression models
granting_institution	Universiti Teknologi Malaysia
granting_department	Faculty of Science
publishDate	2022
url	http://eprints.utm.my/id/eprint/102313/1/AiedhMrisiAlharthiPFS2022.pdf.pdf
_version_	1776100893617291264

Embedded feature selection methods with high dimensionality for elastic net and logistic regression models

Similar Items