New approaches in estimating linear regression model parameters in the presence of multicollinearity and outliers

In multiple linear regression models, the ordinary least squares (OLS) method has been the most popular technique for estimating parameters of model due to its optimal properties and ease of calculation. OLS estimator may fail when the assumption of independence is violated. This assumption can be v...

Full description

Saved in:
Bibliographic Details
Main Author: Al-Mash, Mohammad Sabry Abo
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://eprints.utm.my/id/eprint/78208/1/MohammadSabryAboMFS2017.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In multiple linear regression models, the ordinary least squares (OLS) method has been the most popular technique for estimating parameters of model due to its optimal properties and ease of calculation. OLS estimator may fail when the assumption of independence is violated. This assumption can be violated when there are correlations between the exploratory variables. In this situation, the data is said to contain multicollinearity and eventually will mislead the inferential statistics. However, the problem becomes more complicated when there are abnormal observational data known as outliers. It is now evident that presence of outliers has a serious threat on model with multicollinearity. In this research new procedures on how to improve the parameter estimation method in the presence of multicollinearity and outliers are put forward. The Principal Component Regression (PCR) and Ridge Regression (RR) individually are not resistant to outliers. The results of the research have showed that even if the PCR and RR produced good results with multicollinearity model, it may fail in the presence of outliers. The motive behind this research to find new procedures which are best with high break down point to estimate the model of regression with multicollinearity and outliers characteristics. The proposed methods are called Principal Component regression with Least Trimmed Squares (LTS) based on Tukey bisquare weighted (RWPCLTS) and Principal Component regression with Least Median Squares (LMS) based on Tukey bisquare weighted (RWPCLMS). Empirical applications of cigarette data according to its weight, tar, nicotine, and carbon monoxide contents for different brand of domestic cigarette were used to compare the performance between RWPCLTS and RWPCLMS with the existing methods of PCR and RR methods. A comprehensive simulation study evaluates the impact of multicollinearity and outliers on the proposed methods and existing methods. The considered percentages of outliers in the simulation are 0%, 5%, 10%, 15% and 20%. A selection criterion is proposed based on the best model with bias and root mean squares error for the simulated data and low standard error for real data. Results for both real data and simulation study suggest that the proposed criterion is effective for RWPCLTS and RWPCLMS in multicollinearity and outliers. Moreover, for both methods, the RWPCLTS tend to be the best followed by RWPCLMS when multicollinearity and outliers are present. This research shows the ability of the computationally intense method and viability of combining weighting procedures namely robust LTS-estimation or LMS-estimation and multicollinearity diagnostic methods of PC to achieve accurate regression model. In conclusion, the proposed methods are able to improve the parameter estimation of linear regression by enhancing the existing methods to handle the problem of multicollinearity and outliers in the data set. This improvement will help the analyst to choose the best estimation method in order to produce the most accurate regression model in the presence of multicollinearity and outliers.