Support vector machine and its applications for linear and nonlinear regression in the presence of outliers of high dimensional data
The ordinary least squares (OLS) is reported as the most commonly used method to estimate the relationship between variables (inputs and output) in the linear regression models because of its optimal properties and ease of calculation. Unfortunately, the OLS estimator is not efficient in cases of th...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/69129/1/FS%202016%2050%20IR.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The ordinary least squares (OLS) is reported as the most commonly used method to estimate the relationship between variables (inputs and output) in the linear regression models because of its optimal properties and ease of calculation. Unfortunately, the OLS estimator is not efficient in cases of the presence of outliers in a data set, nonlinear relationships and high dimensional problems. Thus, the search for alternatives that feature the necessary flexibility to handle them has become an urgent necessity such as nonparametric approaches. Consequently, the support vector regression (SVR) is used as an alternative to OLS. .
In this thesis, at first, we consider the identification of outliers through the SVR. In regression, outliers can be classified into two different types, such as vertical outlier and leverage points (good and bad leverage points). It is very important to identify outliers and bad leverage points (BLP) because of their significant effects on estimators. Most of the parametric diagnostic measures are considered good leverage points as bad leverage points. Hence, new nonparametric techniques are proposed for identification outliers that we call the fixed parameters support vector regression methods (FP-SVR). The results of real applications and simulation studies showed that the proposed methods have advantages over classical methods to identify vertical outliers and bad leverage points.
Further, in this thesis, the GM6 version of the robust estimation methods was developed only to identify and inhibit the influence of leverage points (LB) without taking into consideration whether it is good or bad. Thus, a new class of GM-estimators based on FP-SVR technique is developed takes into account minimizing the impact of the bad leverage points only on the model, and we call it GM-SVR. The results show that the performance of the GM-SVR is the best overall, followed by GM6 for all possible combinations of size of samples and percentages of contamination.
This thesis also addresses the problem of high dimensionality in linear and nonlinear regression models. It is well known that the support vector regression has the ability to introduce sparse models (less complexity). Unfortunately, there is a potential problem: if the value of threshold is small (ε near zero), the resulting model depends on a greater number of the training data points, thus making the solution more complexity (non-sparse).Therefore, the single index support vector regression (SI-SVR) model is proposed which combines the flexibility of the nonparametric model and the high accuracy of the parametric model. The real and simulation studies pointed out that the proposed method has the ability to address the problem of high dimensionality.
This thesis also explores the problem of high dimensionality when the number of predictors p larger than the sample size n. Although, we have proposed the SI-SVR to solve the problem of high dimensionality but this model does not have the ability to modeling examples with rank deficient. Furthermore, the efficiency of the resulting SI-SVR model can be decreased and less accurate predictions will be produced when unnecessary predictors are included in the model. Hence, a new method is suggested to overcome this issue using the Elastic Net technique for selecting significant variables which we call the elastic net single index support vector regression (ENSI-SVR). The comparison results show that the ENSI-SVR is an efficient method in dealing with sparse data to achieve dimension reduction which allows applying the SI-SVR easily. |
---|