Robust Diagnostics In Logistic Regression Model

In recent years, due to inconsistency and sensitivity of the Maximum Likelihood Estimator (MLE) in the presence of high leverage points and residual outliers, diagnostic has become an essential part of logistic regression model. High leverage points and residual outliers have huge tendency to bre...

Full description

Saved in:
Bibliographic Details
Main Author: Ariffin @ Mat Zin, Syaiba Balqish
Format: Thesis
Language:English
English
Published: 2010
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/12362/1/FS_2010_19A.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, due to inconsistency and sensitivity of the Maximum Likelihood Estimator (MLE) in the presence of high leverage points and residual outliers, diagnostic has become an essential part of logistic regression model. High leverage points and residual outliers have huge tendency to break the covariate pattern resulting in biased parameter estimates. The identification of high leverage points and residual outliers are believed to be vital in order to improve the performance of the MLE. The presence of high leverage points and the residual outliers give adverse effect on the inferences by inducing large values to the Influence Function (IF). For the identification of high leverage points, Imon (2006) proposed the Distance from the Mean (DM) diagnostic method. The weakness of the DM method is that it tends to swamp some low leverage points even though it can identify the high leverage points correctly. Deleting the low leverage points may lead to a loss of efficiency and precision of the parameter estimates. The Robust Logistic Diagnostic (RLGD) is proposed as an alternative approach that performs well compared to the DM method. The RLGD method incorporates robust approaches and diagnostic procedures. Robust approach is firstly used to identify suspected high leverage points by computing the Robust Mahalanobis Distance (RMD) based on Minimum Volume Ellipsoid (MVE) estimator or Minimum Covariance Determinant (MCD) estimator. For confirmation, the diagnostic procedure is used to compute potential. The RLGD method ensures only correct high leverage points are identified and free from the swamping and masking effects. The performance of the RLGD method is investigated by real examples and the Monte Carlo simulation study. The real examples and the simulation results indicate that the RLGD method correctly identify the high leverage points (increase the probability of the Detection of Capability (DC)) and manage to reduce the number of swamping low leverage points (decrease the probability of the False Alarm Rate (FAR)). The Standardized Pearson Residual (SPR) only successful in identifying a single residual outlier. The SPR method is less effective when residual outliers are present in the covariates. The Generalized Standardized Pearson Residual (GSPR) proposed by Imon and Hadi (2008) is a successful method in identifying residual outliers. However, in the initial stage of the GSPR method utilizes the graphical methods which are based on the observation’s judgement and not suitable for higher dimensional covariates. The Modified Standardized Pearson Residual (MSPR) based on the RLGD method is proposed which is more reliable. The MSPR method provides an alternative method to the GSPR method that produces similar result. The attractive feature of the MSPR method is that it is easier to apply. This research also utilizes the RLGD method in bootstrap procedures. The Classical Bootstrap (CB) procedure by Random-x Re-sampling is not robust to the high leverage points. To accommodate this problem, the newly develop bootstrap procedures based on the RLGD method which are called the Diagnostic Logistic Before Bootstrap (DLGBB) and the Weighted Logistic Bootstrap with Probability (WLGBP) are proposed. In the DLGBB procedure, the high leverage points are excluded before applying the re-sampling process. Meanwhile in the WLGBP procedure, the high leverage points are attributed with low probabilities and consequently having low chances of being selected in the re-sampling process. Simulation results show that the DLGBB and the WLGBP procedures are more robust to the high leverage points compared to the CB procedure.