Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data

The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent p...

Full description

Saved in:
Bibliographic Details
Main Author: Baba, Ishaq Abdullahi
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.104718
record_format uketd_dc
spelling my-upm-ir.1047182023-10-05T06:36:21Z Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data 2022-01 Baba, Ishaq Abdullahi The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number of predictor variables. In addressing this problem, some robust procedures for high dimensional dataset via the RFCH algorithm are developed. A modified reweighted fast consistent and high breakdown (MRFCH) estimator in high dimensional data based on the diagonal elements of the scatter matrix instead of its entire elements in the computation of robust Mahalanobis distance within the RFCH algorithm is developed. The proposed method inherits the robustness properties of the original RFCH estimators. Simulation results and artificial data examples showed that the proposed MRFCH is more efficient and faster than the MRCD and OGK estimators. Outlier detection and classification are critical issues that affect prediction accuracy if not handled correctly. Mahalanobis distance (MD) measure is one of the most popular multivariate analysis tools used to detect multivariate outlying observations. However, the traditional MD based on the classical mean and covariance rarely identifies all the multivariate outliers in a given dataset, which gives rise to the masking and swamping problems. Therefore, the robust location and covariance matrix based on the MRFCH is used instead of the classical estimators to tackle these problems. The proposed algorithm has been applied to detect outliers in the high dimensional data. The results obtained from the simulation study and real data sets indicate that the proposed method possesses high detection power with minimal misclassification error compared to the MRCD and MDP methods. The classical correlation estimators that employ the sample mean of the dependent and independent variables are known to be affected by outliers. Therefore, the robust weighted correlation coefficient that can reduce the effect of outliers is proposed. The weights based on the RD (MRFCH) are incorporated in establishing the proposed robust correlation to solve the problems. The performance of the proposed method is illustrated using simulation study and on glass vessel data with 1920 variables, cardiomyopathy microarray data with 6319 variables, and octane data with 226 dimensions. The results show that the robust weighted correlation based on RD (MRFCH) is more powerful and efficient than the existing methods, irrespective of dimension, sample size, and contamination levels. Sure screening-based correlation methods are popular tools used to select the most significant variables in the true model in sparse and high dimensional analysis. However, in practice, high leverage points may lead to misleading results in solving variable selection problems. Therefore, a robust sure independence screening procedure based on the weighted correlation algorithm of MRFCH for high dimensional data is developed to address this problem. The simulation study results and real data sets indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the best method compared to other methods in this study. Algorithms Robust control 2022-01 Thesis http://psasir.upm.edu.my/id/eprint/104718/ http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf text en public doctoral Universiti Putra Malaysia Algorithms Robust control Midi, Habshah
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Midi, Habshah
topic Algorithms
Robust control

spellingShingle Algorithms
Robust control

Baba, Ishaq Abdullahi
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
description The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number of predictor variables. In addressing this problem, some robust procedures for high dimensional dataset via the RFCH algorithm are developed. A modified reweighted fast consistent and high breakdown (MRFCH) estimator in high dimensional data based on the diagonal elements of the scatter matrix instead of its entire elements in the computation of robust Mahalanobis distance within the RFCH algorithm is developed. The proposed method inherits the robustness properties of the original RFCH estimators. Simulation results and artificial data examples showed that the proposed MRFCH is more efficient and faster than the MRCD and OGK estimators. Outlier detection and classification are critical issues that affect prediction accuracy if not handled correctly. Mahalanobis distance (MD) measure is one of the most popular multivariate analysis tools used to detect multivariate outlying observations. However, the traditional MD based on the classical mean and covariance rarely identifies all the multivariate outliers in a given dataset, which gives rise to the masking and swamping problems. Therefore, the robust location and covariance matrix based on the MRFCH is used instead of the classical estimators to tackle these problems. The proposed algorithm has been applied to detect outliers in the high dimensional data. The results obtained from the simulation study and real data sets indicate that the proposed method possesses high detection power with minimal misclassification error compared to the MRCD and MDP methods. The classical correlation estimators that employ the sample mean of the dependent and independent variables are known to be affected by outliers. Therefore, the robust weighted correlation coefficient that can reduce the effect of outliers is proposed. The weights based on the RD (MRFCH) are incorporated in establishing the proposed robust correlation to solve the problems. The performance of the proposed method is illustrated using simulation study and on glass vessel data with 1920 variables, cardiomyopathy microarray data with 6319 variables, and octane data with 226 dimensions. The results show that the robust weighted correlation based on RD (MRFCH) is more powerful and efficient than the existing methods, irrespective of dimension, sample size, and contamination levels. Sure screening-based correlation methods are popular tools used to select the most significant variables in the true model in sparse and high dimensional analysis. However, in practice, high leverage points may lead to misleading results in solving variable selection problems. Therefore, a robust sure independence screening procedure based on the weighted correlation algorithm of MRFCH for high dimensional data is developed to address this problem. The simulation study results and real data sets indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the best method compared to other methods in this study.
format Thesis
qualification_level Doctorate
author Baba, Ishaq Abdullahi
author_facet Baba, Ishaq Abdullahi
author_sort Baba, Ishaq Abdullahi
title Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_short Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_full Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_fullStr Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_full_unstemmed Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_sort robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
granting_institution Universiti Putra Malaysia
publishDate 2022
url http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf
_version_ 1783725836042502144