Robust Kernel Density Function Estimation

The classical kernel density estimation technique is the commonly used method to estimate the density function. It is now evident that the accuracy of such density function estimation technique is easily affected by outliers. To remedy this problem, Kim and Scott (2008) proposed an Iteratively Re-we...

Full description

Saved in:
Bibliographic Details
Main Author: Dadkhah, Kourosh
Format: Thesis
Language:English
English
Published: 2010
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/19686/1/IPM_2010_7_F.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.19686
record_format uketd_dc
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
English
topic Density functionals
Robust statistics - Data processing
Robust statistics
spellingShingle Density functionals
Robust statistics - Data processing
Robust statistics
Dadkhah, Kourosh
Robust Kernel Density Function Estimation
description The classical kernel density estimation technique is the commonly used method to estimate the density function. It is now evident that the accuracy of such density function estimation technique is easily affected by outliers. To remedy this problem, Kim and Scott (2008) proposed an Iteratively Re-weighted Least Squares (IRWLS) algorithm for Robust Kernel Density Estimation (RKDE). However, the weakness of IRWLS based estimator is that its computation time is very long. The shortcoming of such RKDE has inspired us to propose new non-iterative and unsupervised based approaches which are faster, more accurate and more flexible. The proposed estimators are based on our newly developed Robust Kernel Weight Function (RKWF) and Robust Density Weight Function (RDWF). The basic idea of RKWF based method is to first define a function which measures the outlying distance of observation. The resultant distances are manipulated to obtain the robust weights. The statement of Chandola et al. (2009) that the normal (clean) data appear in high probability area of stochastic model, while the outliers appear in low probability area of stochastic model, has motivated us to develop RDWF. Based on this notion, we employ the pilot (preliminary) estimate of density function as initial similarity (or distance) measure of observations with the neighbours. The modified similarity measures produce the robust weights to estimate density function robustly. Subsequently, the robust weights are incorporated in the kernel function to formulate the robust density function estimation. An extensive simulation study has been carried out to assess the performance of the RKWF-based estimator and RDWF-based estimator. The RKDE based on RKWF and RDWF perform as good as the classical Kernel Density Estimator (KDE) in outlier free data sets. Nonetheless, their performances are faster, more accurate and more reliable than the IRWLS approach for contaminated data sets. The classical kernel density function estimation approach is widely used in various formula and methods. Unfortunately, many researchers are not aware that the KDE is easily affected by outliers. We have proposed the RKDE which is more efficient and consumes less time. Our work on RKDE or corresponding robust weights has motivated us to develop alternative location and scale estimators. A modification is made to the classical location and scale estimator by incorporating the robust weight and RKDE. To evaluate the efficiency of the proposed method, comprehensive contaminated models are designed and simulated. The accuracy of the proposed new method was compared with the location and scale estimators based on M. Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimator. The simulation study demonstrates that, on the whole, the accuracy of the proposed method is better than the competitor methods. The research also develops two new approaches for outlier and potential outlier detection in unimodal and multimodal distributions. The distance of observations from the center of data set is incorporated in the formulation of the first outlier detection method in unimodal distribution. The second method attempts to define an approach that is useable not only for unimodal distribution but also for multimodal distribution. This approach incorporates robust weights, whereby, high weights and low weights are assigned to normal (clean) and outlying observations, respectively. In this thesis, we also illustrate that the sensitivity of RKDE depends on the setting of the tuning constants of the employed loss function. The results of the study indicate that the proposed methods are capable of labelling normal observation and potential outliers in a data set. Additionally, they are able to assign anomaly scores to normal and outlying observations. Finally this thesis also addresses the estimation of Mutual Information (MI) for mixture distribution which prone to create two distant groups in the data. The formulation of MI involves estimation of density function. Mutual information estimate for bivariate random variables involves the bivariate density estimation. The bivariate density estimation employs the estimate of covariance matrix. The sensitivity of covariance matrix to the presence of outliers has motivated us to substitute it with robust estimate derived from MCD and MVE. The efficiency of the modified mutual information estimate is evaluated based on its accuracy. To do this evaluation, the mixtures of bivariate normal distribution with different percentage of contribution are simulated. Simulation results show that the new formulation of MI increases the accuracy of mutual information estimation.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Dadkhah, Kourosh
author_facet Dadkhah, Kourosh
author_sort Dadkhah, Kourosh
title Robust Kernel Density Function Estimation
title_short Robust Kernel Density Function Estimation
title_full Robust Kernel Density Function Estimation
title_fullStr Robust Kernel Density Function Estimation
title_full_unstemmed Robust Kernel Density Function Estimation
title_sort robust kernel density function estimation
granting_institution Universiti Putra Malaysia
granting_department Institute for Mathematical Research
publishDate 2010
url http://psasir.upm.edu.my/id/eprint/19686/1/IPM_2010_7_F.pdf
_version_ 1747811442602541056
spelling my-upm-ir.196862013-05-27T08:02:51Z Robust Kernel Density Function Estimation 2010-12 Dadkhah, Kourosh The classical kernel density estimation technique is the commonly used method to estimate the density function. It is now evident that the accuracy of such density function estimation technique is easily affected by outliers. To remedy this problem, Kim and Scott (2008) proposed an Iteratively Re-weighted Least Squares (IRWLS) algorithm for Robust Kernel Density Estimation (RKDE). However, the weakness of IRWLS based estimator is that its computation time is very long. The shortcoming of such RKDE has inspired us to propose new non-iterative and unsupervised based approaches which are faster, more accurate and more flexible. The proposed estimators are based on our newly developed Robust Kernel Weight Function (RKWF) and Robust Density Weight Function (RDWF). The basic idea of RKWF based method is to first define a function which measures the outlying distance of observation. The resultant distances are manipulated to obtain the robust weights. The statement of Chandola et al. (2009) that the normal (clean) data appear in high probability area of stochastic model, while the outliers appear in low probability area of stochastic model, has motivated us to develop RDWF. Based on this notion, we employ the pilot (preliminary) estimate of density function as initial similarity (or distance) measure of observations with the neighbours. The modified similarity measures produce the robust weights to estimate density function robustly. Subsequently, the robust weights are incorporated in the kernel function to formulate the robust density function estimation. An extensive simulation study has been carried out to assess the performance of the RKWF-based estimator and RDWF-based estimator. The RKDE based on RKWF and RDWF perform as good as the classical Kernel Density Estimator (KDE) in outlier free data sets. Nonetheless, their performances are faster, more accurate and more reliable than the IRWLS approach for contaminated data sets. The classical kernel density function estimation approach is widely used in various formula and methods. Unfortunately, many researchers are not aware that the KDE is easily affected by outliers. We have proposed the RKDE which is more efficient and consumes less time. Our work on RKDE or corresponding robust weights has motivated us to develop alternative location and scale estimators. A modification is made to the classical location and scale estimator by incorporating the robust weight and RKDE. To evaluate the efficiency of the proposed method, comprehensive contaminated models are designed and simulated. The accuracy of the proposed new method was compared with the location and scale estimators based on M. Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimator. The simulation study demonstrates that, on the whole, the accuracy of the proposed method is better than the competitor methods. The research also develops two new approaches for outlier and potential outlier detection in unimodal and multimodal distributions. The distance of observations from the center of data set is incorporated in the formulation of the first outlier detection method in unimodal distribution. The second method attempts to define an approach that is useable not only for unimodal distribution but also for multimodal distribution. This approach incorporates robust weights, whereby, high weights and low weights are assigned to normal (clean) and outlying observations, respectively. In this thesis, we also illustrate that the sensitivity of RKDE depends on the setting of the tuning constants of the employed loss function. The results of the study indicate that the proposed methods are capable of labelling normal observation and potential outliers in a data set. Additionally, they are able to assign anomaly scores to normal and outlying observations. Finally this thesis also addresses the estimation of Mutual Information (MI) for mixture distribution which prone to create two distant groups in the data. The formulation of MI involves estimation of density function. Mutual information estimate for bivariate random variables involves the bivariate density estimation. The bivariate density estimation employs the estimate of covariance matrix. The sensitivity of covariance matrix to the presence of outliers has motivated us to substitute it with robust estimate derived from MCD and MVE. The efficiency of the modified mutual information estimate is evaluated based on its accuracy. To do this evaluation, the mixtures of bivariate normal distribution with different percentage of contribution are simulated. Simulation results show that the new formulation of MI increases the accuracy of mutual information estimation. Density functionals Robust statistics - Data processing Robust statistics 2010-12 Thesis http://psasir.upm.edu.my/id/eprint/19686/ http://psasir.upm.edu.my/id/eprint/19686/1/IPM_2010_7_F.pdf application/pdf en public phd doctoral Universiti Putra Malaysia Density functionals Robust statistics - Data processing Robust statistics Institute for Mathematical Research English