Robust Diagnostics In Logistic Regression Model

In recent years, due to inconsistency and sensitivity of the Maximum Likelihood Estimator (MLE) in the presence of high leverage points and residual outliers, diagnostic has become an essential part of logistic regression model. High leverage points and residual outliers have huge tendency to bre...

Full description

Saved in:
Bibliographic Details
Main Author: Ariffin @ Mat Zin, Syaiba Balqish
Format: Thesis
Language:English
English
Published: 2010
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/12362/1/FS_2010_19A.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.12362
record_format uketd_dc
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
English
topic Robust statistics
Regression analysis
Logistic regression analysis
spellingShingle Robust statistics
Regression analysis
Logistic regression analysis
Ariffin @ Mat Zin, Syaiba Balqish
Robust Diagnostics In Logistic Regression Model
description In recent years, due to inconsistency and sensitivity of the Maximum Likelihood Estimator (MLE) in the presence of high leverage points and residual outliers, diagnostic has become an essential part of logistic regression model. High leverage points and residual outliers have huge tendency to break the covariate pattern resulting in biased parameter estimates. The identification of high leverage points and residual outliers are believed to be vital in order to improve the performance of the MLE. The presence of high leverage points and the residual outliers give adverse effect on the inferences by inducing large values to the Influence Function (IF). For the identification of high leverage points, Imon (2006) proposed the Distance from the Mean (DM) diagnostic method. The weakness of the DM method is that it tends to swamp some low leverage points even though it can identify the high leverage points correctly. Deleting the low leverage points may lead to a loss of efficiency and precision of the parameter estimates. The Robust Logistic Diagnostic (RLGD) is proposed as an alternative approach that performs well compared to the DM method. The RLGD method incorporates robust approaches and diagnostic procedures. Robust approach is firstly used to identify suspected high leverage points by computing the Robust Mahalanobis Distance (RMD) based on Minimum Volume Ellipsoid (MVE) estimator or Minimum Covariance Determinant (MCD) estimator. For confirmation, the diagnostic procedure is used to compute potential. The RLGD method ensures only correct high leverage points are identified and free from the swamping and masking effects. The performance of the RLGD method is investigated by real examples and the Monte Carlo simulation study. The real examples and the simulation results indicate that the RLGD method correctly identify the high leverage points (increase the probability of the Detection of Capability (DC)) and manage to reduce the number of swamping low leverage points (decrease the probability of the False Alarm Rate (FAR)). The Standardized Pearson Residual (SPR) only successful in identifying a single residual outlier. The SPR method is less effective when residual outliers are present in the covariates. The Generalized Standardized Pearson Residual (GSPR) proposed by Imon and Hadi (2008) is a successful method in identifying residual outliers. However, in the initial stage of the GSPR method utilizes the graphical methods which are based on the observation’s judgement and not suitable for higher dimensional covariates. The Modified Standardized Pearson Residual (MSPR) based on the RLGD method is proposed which is more reliable. The MSPR method provides an alternative method to the GSPR method that produces similar result. The attractive feature of the MSPR method is that it is easier to apply. This research also utilizes the RLGD method in bootstrap procedures. The Classical Bootstrap (CB) procedure by Random-x Re-sampling is not robust to the high leverage points. To accommodate this problem, the newly develop bootstrap procedures based on the RLGD method which are called the Diagnostic Logistic Before Bootstrap (DLGBB) and the Weighted Logistic Bootstrap with Probability (WLGBP) are proposed. In the DLGBB procedure, the high leverage points are excluded before applying the re-sampling process. Meanwhile in the WLGBP procedure, the high leverage points are attributed with low probabilities and consequently having low chances of being selected in the re-sampling process. Simulation results show that the DLGBB and the WLGBP procedures are more robust to the high leverage points compared to the CB procedure.
format Thesis
qualification_level Master's degree
author Ariffin @ Mat Zin, Syaiba Balqish
author_facet Ariffin @ Mat Zin, Syaiba Balqish
author_sort Ariffin @ Mat Zin, Syaiba Balqish
title Robust Diagnostics In Logistic Regression Model
title_short Robust Diagnostics In Logistic Regression Model
title_full Robust Diagnostics In Logistic Regression Model
title_fullStr Robust Diagnostics In Logistic Regression Model
title_full_unstemmed Robust Diagnostics In Logistic Regression Model
title_sort robust diagnostics in logistic regression model
granting_institution Universiti Putra Malaysia
granting_department Faculty Of Science
publishDate 2010
url http://psasir.upm.edu.my/id/eprint/12362/1/FS_2010_19A.pdf
_version_ 1747811358299127808
spelling my-upm-ir.123622013-05-27T07:51:51Z Robust Diagnostics In Logistic Regression Model 2010-04 Ariffin @ Mat Zin, Syaiba Balqish In recent years, due to inconsistency and sensitivity of the Maximum Likelihood Estimator (MLE) in the presence of high leverage points and residual outliers, diagnostic has become an essential part of logistic regression model. High leverage points and residual outliers have huge tendency to break the covariate pattern resulting in biased parameter estimates. The identification of high leverage points and residual outliers are believed to be vital in order to improve the performance of the MLE. The presence of high leverage points and the residual outliers give adverse effect on the inferences by inducing large values to the Influence Function (IF). For the identification of high leverage points, Imon (2006) proposed the Distance from the Mean (DM) diagnostic method. The weakness of the DM method is that it tends to swamp some low leverage points even though it can identify the high leverage points correctly. Deleting the low leverage points may lead to a loss of efficiency and precision of the parameter estimates. The Robust Logistic Diagnostic (RLGD) is proposed as an alternative approach that performs well compared to the DM method. The RLGD method incorporates robust approaches and diagnostic procedures. Robust approach is firstly used to identify suspected high leverage points by computing the Robust Mahalanobis Distance (RMD) based on Minimum Volume Ellipsoid (MVE) estimator or Minimum Covariance Determinant (MCD) estimator. For confirmation, the diagnostic procedure is used to compute potential. The RLGD method ensures only correct high leverage points are identified and free from the swamping and masking effects. The performance of the RLGD method is investigated by real examples and the Monte Carlo simulation study. The real examples and the simulation results indicate that the RLGD method correctly identify the high leverage points (increase the probability of the Detection of Capability (DC)) and manage to reduce the number of swamping low leverage points (decrease the probability of the False Alarm Rate (FAR)). The Standardized Pearson Residual (SPR) only successful in identifying a single residual outlier. The SPR method is less effective when residual outliers are present in the covariates. The Generalized Standardized Pearson Residual (GSPR) proposed by Imon and Hadi (2008) is a successful method in identifying residual outliers. However, in the initial stage of the GSPR method utilizes the graphical methods which are based on the observation’s judgement and not suitable for higher dimensional covariates. The Modified Standardized Pearson Residual (MSPR) based on the RLGD method is proposed which is more reliable. The MSPR method provides an alternative method to the GSPR method that produces similar result. The attractive feature of the MSPR method is that it is easier to apply. This research also utilizes the RLGD method in bootstrap procedures. The Classical Bootstrap (CB) procedure by Random-x Re-sampling is not robust to the high leverage points. To accommodate this problem, the newly develop bootstrap procedures based on the RLGD method which are called the Diagnostic Logistic Before Bootstrap (DLGBB) and the Weighted Logistic Bootstrap with Probability (WLGBP) are proposed. In the DLGBB procedure, the high leverage points are excluded before applying the re-sampling process. Meanwhile in the WLGBP procedure, the high leverage points are attributed with low probabilities and consequently having low chances of being selected in the re-sampling process. Simulation results show that the DLGBB and the WLGBP procedures are more robust to the high leverage points compared to the CB procedure. Robust statistics Regression analysis Logistic regression analysis 2010-04 Thesis http://psasir.upm.edu.my/id/eprint/12362/ http://psasir.upm.edu.my/id/eprint/12362/1/FS_2010_19A.pdf application/pdf en public masters Universiti Putra Malaysia Robust statistics Regression analysis Logistic regression analysis Faculty Of Science English