Winsorize tree algorithm for handling outliers in classification problem

Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning...

全面介紹

Saved in:
書目詳細資料
主要作者: Ch’ng, Chee Keong
格式: Thesis
語言:eng
eng
出版: 2016
主題:
在線閱讀:https://etd.uum.edu.my/5780/1/depositpermission_s92068.pdf
https://etd.uum.edu.my/5780/14/s92068_01.pdf
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning or post-pruning of the CART in handling the outliers. This study proposes a modified classification tree algorithm called Winsorize tree based on the distribution of classes in the training dataset. The Winsorize tree investigates all possible outliers from node to node before checking the potential splitting point to gain the node with the highest purity of the nodes. The upper fence and lower fence of a boxplot are used to detect potential outliers whose values exceeding the tail of Q ± (1.5×Interquartile range). The identified outliers are neutralized using the Winsorize method whilst the Winsorize Gini index is then used to compute the divergences among probability distributions of the target predictor’s values until stopping criteria are met. This study uses three stopping rules: node achieved the minimum 10% of total training set,