New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers

In multiple linear regression models, the ordinary least squares (OLS) method has been the most popular technique for estimating parameters of model due to its optimal properties and ease of calculation. OLS estimator may fail when the assumption of independence is violated. This assumption can be v...

Full description

Saved in:
Bibliographic Details
Main Author: Al-Mash, Mohammad Sabry Abo
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://eprints.utm.my/id/eprint/78208/1/MohammadSabryAboMFS2017.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.78208
record_format uketd_dc
spelling my-utm-ep.782082018-07-28T06:26:32Z New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers 2017-01 Al-Mash, Mohammad Sabry Abo QA Mathematics In multiple linear regression models, the ordinary least squares (OLS) method has been the most popular technique for estimating parameters of model due to its optimal properties and ease of calculation. OLS estimator may fail when the assumption of independence is violated. This assumption can be violated when there are correlations between the exploratory variables. In this situation, the data is said to contain multicollinearity and eventually will mislead the inferential statistics. However, the problem becomes more complicated when there are abnormal observational data known as outliers. It is now evident that presence of outliers has a serious threat on model with multicollinearity. In this research new procedures on how to improve the parameter estimation method in the presence of multicollinearity and outliers are put forward. The Principal Component Regression (PCR) and Ridge Regression (RR) individually are not resistant to outliers. The results of the research have showed that even if the PCR and RR produced good results with multicollinearity model, it may fail in the presence of outliers. The motive behind this research to find new procedures which are best with high break down point to estimate the model of regression with multicollinearity and outliers characteristics. The proposed methods are called Principal Component regression with Least Trimmed Squares (LTS) based on Tukey bisquare weighted (RWPCLTS) and Principal Component regression with Least Median Squares (LMS) based on Tukey bisquare weighted (RWPCLMS). Empirical applications of cigarette data according to its weight, tar, nicotine, and carbon monoxide contents for different brand of domestic cigarette were used to compare the performance between RWPCLTS and RWPCLMS with the existing methods of PCR and RR methods. A comprehensive simulation study evaluates the impact of multicollinearity and outliers on the proposed methods and existing methods. The considered percentages of outliers in the simulation are 0%, 5%, 10%, 15% and 20%. A selection criterion is proposed based on the best model with bias and root mean squares error for the simulated data and low standard error for real data. Results for both real data and simulation study suggest that the proposed criterion is effective for RWPCLTS and RWPCLMS in multicollinearity and outliers. Moreover, for both methods, the RWPCLTS tend to be the best followed by RWPCLMS when multicollinearity and outliers are present. This research shows the ability of the computationally intense method and viability of combining weighting procedures namely robust LTS-estimation or LMS-estimation and multicollinearity diagnostic methods of PC to achieve accurate regression model. In conclusion, the proposed methods are able to improve the parameter estimation of linear regression by enhancing the existing methods to handle the problem of multicollinearity and outliers in the data set. This improvement will help the analyst to choose the best estimation method in order to produce the most accurate regression model in the presence of multicollinearity and outliers. 2017-01 Thesis http://eprints.utm.my/id/eprint/78208/ http://eprints.utm.my/id/eprint/78208/1/MohammadSabryAboMFS2017.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:105169 masters Universiti Teknologi Malaysia, Faculty of Science Faculty of Science
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA Mathematics
spellingShingle QA Mathematics
Al-Mash, Mohammad Sabry Abo
New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers
description In multiple linear regression models, the ordinary least squares (OLS) method has been the most popular technique for estimating parameters of model due to its optimal properties and ease of calculation. OLS estimator may fail when the assumption of independence is violated. This assumption can be violated when there are correlations between the exploratory variables. In this situation, the data is said to contain multicollinearity and eventually will mislead the inferential statistics. However, the problem becomes more complicated when there are abnormal observational data known as outliers. It is now evident that presence of outliers has a serious threat on model with multicollinearity. In this research new procedures on how to improve the parameter estimation method in the presence of multicollinearity and outliers are put forward. The Principal Component Regression (PCR) and Ridge Regression (RR) individually are not resistant to outliers. The results of the research have showed that even if the PCR and RR produced good results with multicollinearity model, it may fail in the presence of outliers. The motive behind this research to find new procedures which are best with high break down point to estimate the model of regression with multicollinearity and outliers characteristics. The proposed methods are called Principal Component regression with Least Trimmed Squares (LTS) based on Tukey bisquare weighted (RWPCLTS) and Principal Component regression with Least Median Squares (LMS) based on Tukey bisquare weighted (RWPCLMS). Empirical applications of cigarette data according to its weight, tar, nicotine, and carbon monoxide contents for different brand of domestic cigarette were used to compare the performance between RWPCLTS and RWPCLMS with the existing methods of PCR and RR methods. A comprehensive simulation study evaluates the impact of multicollinearity and outliers on the proposed methods and existing methods. The considered percentages of outliers in the simulation are 0%, 5%, 10%, 15% and 20%. A selection criterion is proposed based on the best model with bias and root mean squares error for the simulated data and low standard error for real data. Results for both real data and simulation study suggest that the proposed criterion is effective for RWPCLTS and RWPCLMS in multicollinearity and outliers. Moreover, for both methods, the RWPCLTS tend to be the best followed by RWPCLMS when multicollinearity and outliers are present. This research shows the ability of the computationally intense method and viability of combining weighting procedures namely robust LTS-estimation or LMS-estimation and multicollinearity diagnostic methods of PC to achieve accurate regression model. In conclusion, the proposed methods are able to improve the parameter estimation of linear regression by enhancing the existing methods to handle the problem of multicollinearity and outliers in the data set. This improvement will help the analyst to choose the best estimation method in order to produce the most accurate regression model in the presence of multicollinearity and outliers.
format Thesis
qualification_level Master's degree
author Al-Mash, Mohammad Sabry Abo
author_facet Al-Mash, Mohammad Sabry Abo
author_sort Al-Mash, Mohammad Sabry Abo
title New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers
title_short New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers
title_full New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers
title_fullStr New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers
title_full_unstemmed New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers
title_sort new approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers
granting_institution Universiti Teknologi Malaysia, Faculty of Science
granting_department Faculty of Science
publishDate 2017
url http://eprints.utm.my/id/eprint/78208/1/MohammadSabryAboMFS2017.pdf
_version_ 1747817933505036288