Robust spatial diagnostic method and parameter estimation for spatial big data regression model
The existing spatial data compression method, namely the Adaptive Spatial Compression Clustering (ASDC) is a very potent method of compressing big data. However, the presence of global outliers in the spatial data affects the formation of spatial dispersion function which subsequently affects the...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/104720/1/MOHAMMED%20BABA%20ALI%20-%20IR.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The existing spatial data compression method, namely the Adaptive Spatial Compression
Clustering (ASDC) is a very potent method of compressing big data. However,
the presence of global outliers in the spatial data affects the formation of spatial
dispersion function which subsequently affects the outcome of the spectral clustering;
this, in effect, affects spatial contiguity. Hence, a new robust spatial compression
technique, which we call Outlier Resistant Adaptive Spatial Clustering (ORASDC)
is proposed. Simulation results of synthetic spatial fields and real data application
reveal that the proposed method is worthwhile in treating the effect of outliers
with over 99% region of similarity retained and over 90% of data similarity maintained.
Further research may be carried out to improving the processing speed of the
ORASDC and to determining the optimum number of clusters that correspond to a
specific data size.
The score statistics (Sci) is formulated to identify spatial outliers in big data.
Nonetheless, the method not only suffers from masking and swamping effects, but
also takes long computational running time. To rectify this problem, a new diag
nostic measure that adopts location adjacency to construct spatial weights, metric
distance reciprocal (MDR) and exponential weight (EW), are developed. Difference
between spatial residuals are calibrated to incorporate adjacency effect into spatial
outlier residual. Results of simulations in large sample sizes have shown remarkable
performance of the proposed methods where both diagnostics measures successfully
detect spatial outliers with minimum swamping effect. Applications of our methods
to real data have also shown good performance.
This thesis also concerned on the establishment of diagnostic measures for the identification
of spatial influential observations (IOs), which are outliers in the x and y
directions of spatial regression models. Some of the classical techniques of identification
of IOs have been adapted to spatial models. Nonetheless, those adapted
methods fail to correctly identify the IOs and show high swamping and masking
effects. Thus, we propose a new measure of spatial studentized prediction residuals
that incorporate spatial information on the dependent variable and residual. To
the best of our knowledge, no research is done on the classification of spatial observations
into regular observations, vertical outliers, good and bad leverage points.
Hence, the ISRs−Posi and ESRs−Posi plots are established to close the gap in the literature.
The results signify that the ESRs−Posi plot, followed by the ISRs−Posi plot
were very successful in classifying observations into the correct groups. The numerical
examples and simulation study have shown that the proposed methods possess
almost 100% accurate detection and 0% swamping, against their competitors that
have lower detection rates and higher swamping rates.
Outliers in spatial applications usually keep vital information about the model; a situation
that calls for method that is effective in accommodating the spatial outliers in
a special way. Variance Shift Outlier Model (VSOM) in the classical regression is
promising in keeping such observations in the model by downweighting their effect
in the model. To date, no research has been done to obtain spatial representation of
VSOM. To fill the gap in the literature, we formulated the VSOM in the spatial regression
model which we call Spatial Variance Shift Outlier Model (SVSOM) using
the Residual Maximum Likelihood (REML). Weights based on the detected outliers
are used to accommodate the spatial outliers via revised model with the help of the
SVSOM. The results of simulation study and real data set indicate that our proposed
method has significant improvement in parameter estimation and outlier accommodation. |
---|