Robust correlation coefficient based on robust scale and location estimator

The correlation coefficient is the common statistical analysis that has been used in measuring the relationship between two variables. The most frequently used correlation coefficients is the Pearson correlation coefficient. This coefficient is powerful when the assumptions of linearity between two...

Full description

Saved in:
Bibliographic Details
Main Author: Nur Amira, Zakaria
Format: Thesis
Language:eng
eng
eng
Published: 2018
Subjects:
Online Access:https://etd.uum.edu.my/9137/1/s818475_01.pdf
https://etd.uum.edu.my/9137/2/s818475_02.pdf
https://etd.uum.edu.my/9137/3/s818475_references.docx
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.9137
record_format uketd_dc
spelling my-uum-etd.91372022-03-28T00:41:33Z Robust correlation coefficient based on robust scale and location estimator 2018 Nur Amira, Zakaria Abdullah, Suhaida Ahad, Nor Aishah Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Arts & Sciences QA273-280 Probabilities. Mathematical statistics The correlation coefficient is the common statistical analysis that has been used in measuring the relationship between two variables. The most frequently used correlation coefficients is the Pearson correlation coefficient. This coefficient is powerful when the assumptions of linearity between two variables and the normality of the distribution are fulfilled. However, this correlation coefficient unable to perform well with the presence of the outlier in the data. The calculation of the Pearson correlation coefficient uses mean, which known to be very sensitive to the outlier. Alternatively, the Spearman rank correlation coefficient and Kendall’s Tau correlation coefficient are the solutions for this problem. The usage of rank in the calculation of these coefficients instead of original observation lead to losing useful information. For that reason, this study focusing on robust correlation approach based on the median. The existence of median based correlation coefficient used Median Absolute Deviation (MAD) as it scales estimator. Nevertheless, the MAD has low efficiency under Gaussian distribution and this estimator only view dispersion on symmetric distribution. Thus, this study modified the median based correlation using two approaches. Firstly, using the same median based correlation, this study proposed another robust scale estimator namely MADn, Sn, and Qn. Secondly, this study changed the median based correlation to the Hodges Lehmann based correlation and employed all robust scale estimators that are median, MAD, MADn, Sn, and Qn. The performances of the proposed procedures were evaluated based on two conditions of simulation data; perfect and contaminated data. Three indicators were used in evaluating the performance of the proposed procedures which are the correlation coefficient value, the average bias and the standard error. The proposed procedures were validated using a real dataset. The results of the simulation data show that the Qn correlation coefficient and Hodges Lehmann- Qn correlation coefficient performed better under contaminated data compared to the Pearson correlation coefficient and other existing robust correlation coefficients. As the conclusion, the Qn correlation coefficient and the Hodges Lehmann- Qn correlation coefficient are the good alternatives for the Pearson correlation coefficient when there is the outlier in the data. 2018 Thesis https://etd.uum.edu.my/9137/ https://etd.uum.edu.my/9137/1/s818475_01.pdf text eng public https://etd.uum.edu.my/9137/2/s818475_02.pdf text eng public https://etd.uum.edu.my/9137/3/s818475_references.docx text eng public other masters Universiti Utara Malaysia
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
eng
advisor Abdullah, Suhaida
Ahad, Nor Aishah
topic QA273-280 Probabilities
Mathematical statistics
spellingShingle QA273-280 Probabilities
Mathematical statistics
Nur Amira, Zakaria
Robust correlation coefficient based on robust scale and location estimator
description The correlation coefficient is the common statistical analysis that has been used in measuring the relationship between two variables. The most frequently used correlation coefficients is the Pearson correlation coefficient. This coefficient is powerful when the assumptions of linearity between two variables and the normality of the distribution are fulfilled. However, this correlation coefficient unable to perform well with the presence of the outlier in the data. The calculation of the Pearson correlation coefficient uses mean, which known to be very sensitive to the outlier. Alternatively, the Spearman rank correlation coefficient and Kendall’s Tau correlation coefficient are the solutions for this problem. The usage of rank in the calculation of these coefficients instead of original observation lead to losing useful information. For that reason, this study focusing on robust correlation approach based on the median. The existence of median based correlation coefficient used Median Absolute Deviation (MAD) as it scales estimator. Nevertheless, the MAD has low efficiency under Gaussian distribution and this estimator only view dispersion on symmetric distribution. Thus, this study modified the median based correlation using two approaches. Firstly, using the same median based correlation, this study proposed another robust scale estimator namely MADn, Sn, and Qn. Secondly, this study changed the median based correlation to the Hodges Lehmann based correlation and employed all robust scale estimators that are median, MAD, MADn, Sn, and Qn. The performances of the proposed procedures were evaluated based on two conditions of simulation data; perfect and contaminated data. Three indicators were used in evaluating the performance of the proposed procedures which are the correlation coefficient value, the average bias and the standard error. The proposed procedures were validated using a real dataset. The results of the simulation data show that the Qn correlation coefficient and Hodges Lehmann- Qn correlation coefficient performed better under contaminated data compared to the Pearson correlation coefficient and other existing robust correlation coefficients. As the conclusion, the Qn correlation coefficient and the Hodges Lehmann- Qn correlation coefficient are the good alternatives for the Pearson correlation coefficient when there is the outlier in the data.
format Thesis
qualification_name other
qualification_level Master's degree
author Nur Amira, Zakaria
author_facet Nur Amira, Zakaria
author_sort Nur Amira, Zakaria
title Robust correlation coefficient based on robust scale and location estimator
title_short Robust correlation coefficient based on robust scale and location estimator
title_full Robust correlation coefficient based on robust scale and location estimator
title_fullStr Robust correlation coefficient based on robust scale and location estimator
title_full_unstemmed Robust correlation coefficient based on robust scale and location estimator
title_sort robust correlation coefficient based on robust scale and location estimator
granting_institution Universiti Utara Malaysia
granting_department Awang Had Salleh Graduate School of Arts & Sciences
publishDate 2018
url https://etd.uum.edu.my/9137/1/s818475_01.pdf
https://etd.uum.edu.my/9137/2/s818475_02.pdf
https://etd.uum.edu.my/9137/3/s818475_references.docx
_version_ 1747828531914604544