Robust linear discriminant rules with coordinatewise and distance based approaches

Linear discriminant analysis (LDA) is one of the supervised classification techniques to deal with relationship between a categorical variable and a set of continuous variables. The main objective of LDA is to create a function to distinguish between groups and allocating future observations to prev...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Yai Fung
Format: Thesis
Language:eng
eng
eng
Published: 2020
Subjects:
Online Access:https://etd.uum.edu.my/8799/1/Deposit%20Permission_s900800.pdf
https://etd.uum.edu.my/8799/2/s900800_01.pdf
https://etd.uum.edu.my/8799/3/s900800_references.docx
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Linear discriminant analysis (LDA) is one of the supervised classification techniques to deal with relationship between a categorical variable and a set of continuous variables. The main objective of LDA is to create a function to distinguish between groups and allocating future observations to previously defined groups. Under the assumptions of normality and homoscedasticity, the LDA yields optimal linear discriminant rule (LDR) between two or more groups. However, the optimality of LDA highly relies on the sample mean and sample covariance matrix which are known to be sensitive to outliers. To abate these conflicts, robust location and scale estimators via coordinatewise and distance based approaches have been applied in constructing new robust LDA. These robust estimators were used to replace the classical sample mean and sample covariance to form robust linear discriminant rules (RLDR). A total of six RLDR, namely four coordinatewise (RLDRM, RLDRMw, RLDRW, RLDRWw) and two distance based (RLDRV, RLDRT) approaches have been proposed and implemented in this study. Simulation and real data study were conducted to investigate on the performance of the proposed RLDR, measured in terms of misclassification error rates and computational time. Several data conditions such as non-normality, heteroscedasticity, balanced and unbalanced data set were manipulated in the simulation study to evaluate the performance of these proposed RLDR. In real data study, a set of diabetes data was used. This data set violated the assumptions of normality as well as homoscedasticity. The results showed that the novel RLDRV is the best proposed RLDR to solve classification problem since it provides as much as 91.03% accuracy in classification as shown in the real data study. The proposed RLDR are good alternatives to the classical LDR as well as existing RLDR since these RLDR perform well in classification problems even under contaminated data.