Linear regression for data having multicollinearity, heteroscedasticity and outliers

Evaluation of regression model is very much influenced by the choice of accurate estimation method since it can produce different conclusions from the empirical results. Thus, it is important to use appropriate estimation method in accordance with the type of statistical data. Although reliable for...

Full description

Saved in:
Bibliographic Details
Main Author: Rasheed, Bello AbdulKadiri
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://eprints.utm.my/id/eprint/84005/1/BelloAbdulKadiriPFS20217.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.84005
record_format uketd_dc
spelling my-utm-ep.840052019-11-05T04:33:52Z Linear regression for data having multicollinearity, heteroscedasticity and outliers 2017-01 Rasheed, Bello AbdulKadiri QA Mathematics Evaluation of regression model is very much influenced by the choice of accurate estimation method since it can produce different conclusions from the empirical results. Thus, it is important to use appropriate estimation method in accordance with the type of statistical data. Although reliable for a single or a few outliers, standard diagnostic techniques from wild bootstrap fit can fail while the existing robust wild bootstrap based on MM-estimator is not resistant to high leverage points. The presence of high leverage points introduces multicollinearity while the MM-estimator is also not resistant to the presence of multicollinearity in the data. This research proposes new methods that deal with heteroscedasticity, multicollinearity, outliers and high leverage points more effectively than currently published methods. The proposed methods are called modified robust wild bootstrap, modified robust principal component (PC) with wild bootstrap and modified robust partial least squares (PLS) with wild bootstrap estimations. These methods are based on weighted procedures that incorporate generalized M-estimator (GM-estimator) with initial and scale estimate using S-estimator and MM-estimator. In addition, the multicollinearity diagnostics procedures of PC and PLS were also used together with the wild bootstrap sampling procedure of Wu and Liu. Empirical applications of data for national growth, income per capital data of the Organisation of Economic Community Development (OECD) countries and tobacco data were used to compare the performance between wild bootstrap, robust wild bootstrap, modified robust wild bootstrap, modified robust PC with wild bootstrap and modified robust PLS with wild bootstrap methods. A comprehensive simulation study evaluates the impacts of heteroscedasticity, multicollinearity outliers and high leverage points on numerous existing methods. A selection criterion is proposed based on the best model with bias and root mean squares error for the simulated data and low standard error for real data. Results for both real data and simulation study suggest that the proposed criterion is effective for modified robust wild bootstrap estimation in heteroscedasticity data with outliers and high leverage points. On the other hand, the modified robust PC with wild bootstrap estimation and modified robust PLS with wild bootstrap estimation is more effective in multicollinearity, heteroscedasticity, outliers and high leverage points. Moreover, for both methods, the modified robust sampling procedure of Liu based on Tukey biweight with initial and scale estimate from MM-estimator tend to be the best. While the best method for data with multicollinearity, heteroscedasticity, outliers and high leverage points is the modified robust PC with wild bootstrap estimation. This research shows the ability of the computationally intense method and viability of combining three different weighting procedures namely robust GM-estimation, wild bootstrap and multicollinearity diagnostic methods of PLS and PC to achieve accurate regression model. In conclusion, this study is able to improve parameter estimation of linear regression by enhancing the existing methods to consider the problem of multicollinearity, heteroscedasticity, outliers and high leverage points in the data set. This improvement will help the analyst to choose the best estimation method in order to produce the most accurate regression model. 2017-01 Thesis http://eprints.utm.my/id/eprint/84005/ http://eprints.utm.my/id/eprint/84005/1/BelloAbdulKadiriPFS20217.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:126196 phd doctoral Universiti Teknologi Malaysia, Faculty of Science Faculty of Science
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA Mathematics
spellingShingle QA Mathematics
Rasheed, Bello AbdulKadiri
Linear regression for data having multicollinearity, heteroscedasticity and outliers
description Evaluation of regression model is very much influenced by the choice of accurate estimation method since it can produce different conclusions from the empirical results. Thus, it is important to use appropriate estimation method in accordance with the type of statistical data. Although reliable for a single or a few outliers, standard diagnostic techniques from wild bootstrap fit can fail while the existing robust wild bootstrap based on MM-estimator is not resistant to high leverage points. The presence of high leverage points introduces multicollinearity while the MM-estimator is also not resistant to the presence of multicollinearity in the data. This research proposes new methods that deal with heteroscedasticity, multicollinearity, outliers and high leverage points more effectively than currently published methods. The proposed methods are called modified robust wild bootstrap, modified robust principal component (PC) with wild bootstrap and modified robust partial least squares (PLS) with wild bootstrap estimations. These methods are based on weighted procedures that incorporate generalized M-estimator (GM-estimator) with initial and scale estimate using S-estimator and MM-estimator. In addition, the multicollinearity diagnostics procedures of PC and PLS were also used together with the wild bootstrap sampling procedure of Wu and Liu. Empirical applications of data for national growth, income per capital data of the Organisation of Economic Community Development (OECD) countries and tobacco data were used to compare the performance between wild bootstrap, robust wild bootstrap, modified robust wild bootstrap, modified robust PC with wild bootstrap and modified robust PLS with wild bootstrap methods. A comprehensive simulation study evaluates the impacts of heteroscedasticity, multicollinearity outliers and high leverage points on numerous existing methods. A selection criterion is proposed based on the best model with bias and root mean squares error for the simulated data and low standard error for real data. Results for both real data and simulation study suggest that the proposed criterion is effective for modified robust wild bootstrap estimation in heteroscedasticity data with outliers and high leverage points. On the other hand, the modified robust PC with wild bootstrap estimation and modified robust PLS with wild bootstrap estimation is more effective in multicollinearity, heteroscedasticity, outliers and high leverage points. Moreover, for both methods, the modified robust sampling procedure of Liu based on Tukey biweight with initial and scale estimate from MM-estimator tend to be the best. While the best method for data with multicollinearity, heteroscedasticity, outliers and high leverage points is the modified robust PC with wild bootstrap estimation. This research shows the ability of the computationally intense method and viability of combining three different weighting procedures namely robust GM-estimation, wild bootstrap and multicollinearity diagnostic methods of PLS and PC to achieve accurate regression model. In conclusion, this study is able to improve parameter estimation of linear regression by enhancing the existing methods to consider the problem of multicollinearity, heteroscedasticity, outliers and high leverage points in the data set. This improvement will help the analyst to choose the best estimation method in order to produce the most accurate regression model.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Rasheed, Bello AbdulKadiri
author_facet Rasheed, Bello AbdulKadiri
author_sort Rasheed, Bello AbdulKadiri
title Linear regression for data having multicollinearity, heteroscedasticity and outliers
title_short Linear regression for data having multicollinearity, heteroscedasticity and outliers
title_full Linear regression for data having multicollinearity, heteroscedasticity and outliers
title_fullStr Linear regression for data having multicollinearity, heteroscedasticity and outliers
title_full_unstemmed Linear regression for data having multicollinearity, heteroscedasticity and outliers
title_sort linear regression for data having multicollinearity, heteroscedasticity and outliers
granting_institution Universiti Teknologi Malaysia, Faculty of Science
granting_department Faculty of Science
publishDate 2017
url http://eprints.utm.my/id/eprint/84005/1/BelloAbdulKadiriPFS20217.pdf
_version_ 1747818424332976128