Winsorize tree algorithm for handling outliers in classification problem

Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Ch’ng, Chee Keong
التنسيق:	أطروحة
اللغة:	eng eng
منشور في:	2016
الموضوعات:	QA273-280 Probabilities Mathematical statistics
الوصول للمادة أونلاين:	https://etd.uum.edu.my/5780/1/depositpermission_s92068.pdf https://etd.uum.edu.my/5780/14/s92068_01.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my-uum-etd.5780
record_format	uketd_dc
institution	Universiti Utara Malaysia
collection	UUM ETD
language	eng eng
advisor	Ismail, Wan Rosmanira Mahat, Nor Idayu
topic	QA273-280 Probabilities Mathematical statistics
spellingShingle	QA273-280 Probabilities Mathematical statistics Ch’ng, Chee Keong Winsorize tree algorithm for handling outliers in classification problem
description	Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning or post-pruning of the CART in handling the outliers. This study proposes a modified classification tree algorithm called Winsorize tree based on the distribution of classes in the training dataset. The Winsorize tree investigates all possible outliers from node to node before checking the potential splitting point to gain the node with the highest purity of the nodes. The upper fence and lower fence of a boxplot are used to detect potential outliers whose values exceeding the tail of Q ± (1.5×Interquartile range). The identified outliers are neutralized using the Winsorize method whilst the Winsorize Gini index is then used to compute the divergences among probability distributions of the target predictor’s values until stopping criteria are met. This study uses three stopping rules: node achieved the minimum 10% of total training set,
format	Thesis
qualification_name	Ph.D.
qualification_level	Doctorate
author	Ch’ng, Chee Keong
author_facet	Ch’ng, Chee Keong
author_sort	Ch’ng, Chee Keong
title	Winsorize tree algorithm for handling outliers in classification problem
title_short	Winsorize tree algorithm for handling outliers in classification problem
title_full	Winsorize tree algorithm for handling outliers in classification problem
title_fullStr	Winsorize tree algorithm for handling outliers in classification problem
title_full_unstemmed	Winsorize tree algorithm for handling outliers in classification problem
title_sort	winsorize tree algorithm for handling outliers in classification problem
granting_institution	Universiti Utara Malaysia
granting_department	Awang Had Salleh Graduate School of Arts & Sciences
publishDate	2016
url	https://etd.uum.edu.my/5780/1/depositpermission_s92068.pdf https://etd.uum.edu.my/5780/14/s92068_01.pdf
_version_	1813495780460199936
spelling	my-uum-etd.57802024-09-21T12:53:12Z Winsorize tree algorithm for handling outliers in classification problem 2016 Ch’ng, Chee Keong Ismail, Wan Rosmanira Mahat, Nor Idayu Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Arts and Sciences QA273-280 Probabilities. Mathematical statistics Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning or post-pruning of the CART in handling the outliers. This study proposes a modified classification tree algorithm called Winsorize tree based on the distribution of classes in the training dataset. The Winsorize tree investigates all possible outliers from node to node before checking the potential splitting point to gain the node with the highest purity of the nodes. The upper fence and lower fence of a boxplot are used to detect potential outliers whose values exceeding the tail of Q ± (1.5×Interquartile range). The identified outliers are neutralized using the Winsorize method whilst the Winsorize Gini index is then used to compute the divergences among probability distributions of the target predictor’s values until stopping criteria are met. This study uses three stopping rules: node achieved the minimum 10% of total training set, 2016 Thesis https://etd.uum.edu.my/5780/ https://etd.uum.edu.my/5780/1/depositpermission_s92068.pdf text eng staffonly https://etd.uum.edu.my/5780/14/s92068_01.pdf text eng public Ph.D. doctoral Universiti Utara Malaysia Abraham, B., & Ledolter, J. (2006). Introduction to regression modeling. Belmont, USA: Thomson Higher Education. Acuna, E., & Rodriguez, C. A. (2004). Meta analysis study of outlier detection methods in classification, Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez, Retrieved from academic.uprm.edu/eacuna/ paperout.pdf. In proceedings IPSI 2004, Venice, 2004. Altman, D. G., & Bland, J. M. (2009). Parametric v non-parametric methods for data analysis. BMJ 2009;338:a3167 Apte, C., & Weiss, S. (1997). Data mining with decision tress and decision rules. Future Generation Computer Systems, 13, 197–210. Baesens, B., Van Gestel, T., Viaena, S., Stepanova, M., Suykens, J., & Vanthlenen, J. (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research, 54(6), 249-268. Bahrololum, M., & Khaleghi, M. (2008). Anomaly intrusion detection system using hierarchical gaussian mixture model. Journal of Computer Science and Network Security, 8(8), 264-271. Barnett, V. (1978). The study of outliers: purpose and model. Journal of Applied Statistics, 27(3), 242-250. Barnett, V., & Lewis, T. (1984). Outliers in statistical data. (2nd ed.). New York: John Wiley. Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York: John Wiley. Becker, R. A., Cleveland, W. S., & Wilk, A. R. (1987). Dynamic graphics for data analysis. Journal of Statistical Science, 2(4), 355-383. Beguin, C. & Hulliger, B. (2004). Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations. Journal of the royal statistical society, 167(2), 275-294. Bensen, B., Gestel, T. V., Stepanova, M., Van den Poel, D., & Vanthienen, J. (1995). Neural network survival analysis for personal data. Journal of the Operational Research Society, 56(9), 1089-1098. Bertolini, M., & Bevilacqua, M. (2006). Methodology and theory oil pipeline spill cause analysis: a classification tree approach. Journal of Quality in Maintenance Engineering, 12(2), 186-198. Ben-Gal, I. (2005). Outlier detection. US: Springer. Bluman, A. G (2004). Elementary statistics. (2nd ed.). New York: McGraw Hill. Bolton, R. J. & Hand, D. J. (2002). Statistical fraud detection: a review. Journal of Statistical Science, 17(3), 235-249. Bratko, I., & Bohanec, M. (1994). Trading accuracy for simplicity in decision trees. Machine Learning, 15(3), 223-250. Bramer, M. (2013). Principle of data mining. Springer-Verlag London Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Monterey, Calif., U.S.A.: Wadsworth, Inc. Breimen, L. (1996). Some properties of splitting criteria. Machine Learning, 24(1), 41-47. Bridge, P. D. & Sawilowsky, S. S. (1999). Increasing physicians’ awareness of the impact of statistics on research outcomes: comparative power of the t-test and Wilcoxon rank sum test in small samples applied research. Journal of Clinical Epidemiology, 52(3), 229–235. Chamber, R., Hentges, A., & Zhao , X. Q. (2004). Robust automatic methods for outlier and error detection. Journal of Royal Statistical Society, 167(2), 323-339. Chaovalit, P., & Zhou, L. (2005). A comparison between supervised and unsupervised classification approaches. Proceedings of the 38th Annual Hawaii International Conference on System Sciences (pp. 1-9), Hawaii: IEEE. Cernick, M. R. (2008). Bootstrap methods a practitioner’s guide. New York: John Wiley. Christina, M. R. K. (2009). Nonparametric vs Parametric Tests of Location in Biomedical Research. Amrican Journal of Ophthamology. 147(4), 571-572. Chambers, J. M., & Hastie, T. J. (1992). Statistical models in S. Wadsworth and Brooks/Cole, Pacific Grove: CA. Coles, S., & Rowley, J. (1995). Revisiting decision trees. Journal of Management Decision, 33(8), 46-50. Cunning, P., Cord, M., & Delany, S. J. (2008). Supervised learning. In P. Cunning & M. Cord (Eds). Machine learning techniques for multimedia. Springer. Curnow, R. N., & Franklin, M. F. (1973). Some further problem in the classification of human chromosomes. International Bimetric Society, 29(3), 429-440. Davies, L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statiscal Association, 88(423), 782-792. De’ath, G., & Fabricius, K. E. (2000). A powerful yet simple technique for ecological data analysis. Journal of Ecology Society of America, 81(11), 3178-3192. De Veaux, R. D., & Hand., D. J. (2005). How to lie with bad data. Journal of Statistical Science, 20(3), 231-238. Dixon, W. J. (1960). Simplified estimation from censored normal samples. The Annals of Mathematical Statistics. 31(2), 385-391. Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: John Wiley and sons. Duda, R. O., Hart, P. E., & Stork, D. R. (2001). Pattern Recognition. The University of Michigan: Wiley. Dunham, M. H. (2003). Data mining introductory and advanced topics. New Jersey: Prentice Hall. Efron, B. (1983). Estimating the misclassification rate of a prediction rule: improvement cross validation. Journal of the American Statistical Asscociation, 78(382), 316-331. Efron, B. & Tibshirani, R. J. (1993). An introduction to the bootstapping. London: Chapman & Hall. Engels, R. (1996). Planning tasks for knowledge discovery in databases; performing task-oriented user-guidance. Proceedings of the 2nd int. Conf. on Knowledge Discovery in Databases (pp 170-175). AAAI press. Engels, R., Evans, B., Herrmann, J. & Verdenius, F. (Eds.) (1997). Proceedings of the workshop on Machine Learning Application in the real world; Methodological Aspects and Implications. 14th International Conference on Machine Learning. Engels, R., & Theusinger, C. (1998) Using a data metric for preprocessing advice for data mining applications. In Prade, H. (ed.). Proceeding of 13th European Conference on Artificial Intelligence (pp 430-434). John Wiley & Sons, Chichester. Evans, V. P. (1999). Strategy for detecting outliers in regression analysis: an introductory primer (Report No. TM029440 ED427059). San Antonio: Texas A & M University. Eygptian Skull Department. (n.d.) The data and story library. Retrieved from http://lib.stat.cmu.edu/DASL/Stories/EgyptianSkull Development.html Fawagreh, K., & Gaber, M. M., & Elyan, E. (2015). CoRR abs/1503.04996 Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics. 179-188. Frank, E. (2000), Pruning decision trees and lists (Doctoral dissertation). Retrieved from http://www.cs.waikato.ac.nz/~eibe/pubs/thesis.final.pdf. Freitas, A. A. (2014). Comprehensible classification models: a position paper. SIGKDD Explorations Newsletter. 15(1). 1-9 Gentleman, J. F. & Wilk, M. B. (1975). Detecting outliers. II. Supplementing the direct analysis of residuals. Journal of Biometrics, 31(2), 387-410. Geisser, S. (1975). The predictive sample reused method with applications. Journal of American Statistical Association, 70(350), 243-250. Ghahramani, Z. (2004). Unsupervised learning. In Bousquet, O., von Luxburg, U. and Raetsch, G. Advanced Lectures in Machine Learning. (pp.72-112). Berlin: Springer-Verlag. Goutte, C. (1997). Note on tree lunches and cross validation. Neural Computational, 9(6), 1211-1215. Groβ, J. (2003). Linear regression analysis. New York: Springer. Grubbs, F. E. (1950). Sample criteria for testing outlying observation observations. Annals of Mathematical Statistics.21(1), 27-58. Gupta, G. K. (2006). Introduction to data mining with case studies. New Delhi: Prentice Hall. Hadi, A. S. (1992). Identifying multiple outliers in multivariate data. Journal of the Royal Statistical Society, 54(3), 761-771. Hadi. A. S. (1994). A modification of a method for the detection of outliers in multivariate samples. Journal of Royal Statistical Society, 56(2), 393-396. Hadi, A. S & Simonoff, J. S. (1993). Procedure for the identification of multiple outliers in linear models. Journal of American Statistical Association, 88(424), 1264-1272. Hair, J. F., Anderson, R., Tatham, R. L., Black, W. C. (1992). Multivariate data analysis with reading. (3rd ed.). New York: Macmillan Hamilton, L.C. (1992). Regressions with graphics: A second course in applied statistics. Monterey, CA: Brooks/Cole. Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of American Statistic Association, 69(346), 383-393. Han, J. & Kamber, M (2006). Data mining. Amsterdam: Elsevier. Hand D.J. (1997). Construction and assessment of classification rules, University of Michigan: Wiley. Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. & Ostrowski, E. (1994). A small handbook of small data. London: Chapman & Hall Haslett, J., Bradley, R., Craig, P., Unwin, A. & Wills, G. (1991). Dynamic graphics for exploring spatial data with applications to locating global and local anomalies. The American Statistician, 45(3), 234–242. Haughton, D & Oulabi, S. (1997). Direct marketing modeling with CART and CHAID. Journal of Interactive Marketing, 11(4), 42-52. Hauskrecht,et al. (2010). Conditional outlier detection for clinical alerting. AMIA Annual Symposium Proceeding (pp. 286-290). Hawkins, D. M. (1980). Identification of outliers. New York: Chapman and Hall. Hildebrand, D. K. (1986). Statistical thinking for behavioral scientists. Boston: Duxbury. Ho, T. J. (2004). Data mining and data warehousing. Singapore: Prentice Hall. Hollander, M. & Wolfe, D. A. (1999). Nonparametric statistical methods. (2nd ed.). New York: John Wiley. Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. (2nd ed.). Canada: John Wiley & Son. Iglewicz, Boris & Hoaglin, D. C. (1993). How to detect and handle outliers (volume 16). Milwaukee, Wisconsin: ASQC. Jacobs, R. (2001). Outliers in statistical analysis: basic methods of detection and accommodation. (Report No. TM032341 ED450151). San Antonio: Texas A & M University. Jiang Wen yu & Simon, R. (2007). A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Statistic in Medicine, 26(29), 5320-5334. Joachims, T. (2005). Text categorization with support vector machines: learning with many relevant features. Germany: Springer Berlin Heidelberg. John, G. H. (1995). Robust decision trees: removing outliers from databases. KDD-95 Proceeding (pp. 174-179), Menlo Park, CA,: AAAI. Johnson, D. E. (1998). Applied multivariate method for data analysis. California: Duxbury Press. Jossinet, J. (1996).Variability of impedivity in normal and pathological breast tissue. Med. & Biol. Eng. & Comput, 34, 346-350. Kantardzic, M. (2011). Data mining concepts, models, and algorithms (2nd ed.). Hoboken, New Jersey: Wiley. Kardi, T. (2006). What is bootstrap sampling. Retrieved May 6, 2009, from http://people.revoledu.com/kardi/tutorial/Bootstrap/bootstrap.htm. Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data, Applied Statistics, 29 (2), 119–127. Kaufman, L. & Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley: New York. Kohn, L. T., Corrigan, J. M., & Donaldson, M. S. (2000). To err is human: building a safer health system. Washington: National Academy Press. Kotsiantis, S.B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31(3), 249-218. Koufakou, A., Secretan, J., Reeder, J., Cardona, K., & Georgiopoulos, M. (2008). Fast parallel outlier detection for categorical datasets using mapreduce. International Join Conference on Neural Networks (IJCNN 2008). 3298-3304, 2008. 197 Kyung, H. O., June, S. S., Doo, H. H., & Nam, S. K. (2011). Decision tree-based clustering with outlier detection for HMM-based speech synthesis. 12th Annual Conference of the International Speech Communication Association (pp. 101-104), Florence, Italy: ISCA. Lachenbrunch, P. A. (1975). Discriminant Analysis. New York: Hafner Press. Larson, R. & Farber, B. (2006). Elementary statistics (picturing the world). (3rd ed). New Jersey: Pearson Prentice Hall. Lisboa, P. G. J. (1992). Neural networks: current applications. London: Chapman & Hall. Loh, W. Y. (2011). Classification and regression tree. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1 ,14-23. Mahat, N. I. (2006). Some investigations in discriminant analysis with mixed variables. Ph.D. thesis. Exeter University, UK. Mclachlan, G. J. (2004). Discriminant analysis and statiscal pattern recognition. Canada: Jon Wiley & Sons. Miller, T. W. (2005). Data and text mining: a business application approach. Upper Saddle River, New Jersey: Prentice hall. Mingers, J. (1987). Expert systems—rule induction with statistical data. Journal of the Operational Research Society, 38, 39–47. Molinaro, A. M., Simon, R., & Pfeiffer, R. M. (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15), 3301-3307. Muniyanni, A. P., Rajeswari, R., & Rajaram, R. (2011). Network anomaly detection by cascading K-means clustering and C4.5 decision tree algorithm. Procedia Engineering,30 (2012), 174-182. Newton, R.R., & Rudestam, K.E. (1999). Your statistical consultant: Answers to your data analysis questions. Thousand Oaks, CA: Sage. Ng, R.T. & Han, J. (1994). Efficient and Effective Clustering Methods for Spatial Data Mining, In Proceedings of Very Large Data Bases Conference, 144-155. Orr, J. M., Sackett, P. R., & DuBois, C. L. Z. (1991). Outlier detection and treatment in I/O Psychology: A survey of researcher beliefs and an empirical illustration. Personnel Psychology, 44(3), 473-486. 198 Osborne, J. W. (2002). Notes on the use of data transformations. Practical Assessment, Research, and Evaluation., 8. Retrieved on April 5, 2014, from http://ericae.net/pare/getvn.asp?v=8&n=6. Octavian (2011). Decision tree-C4.5 [Octavian’s blog]. Retrieved Sept 10, 2014, from http://octaviansima.wordpress.com/2011/03/25/decision-trees-c4-5/ Parisot, O., Ghoniem, M., & Otjacques, B. (2014). Decision trees and data preprocessing to help clustering interpretation. The 3rd International Conference on Data Management Technology and Applications (pp. 48-55). Vienna Austria. Penny, K. I. (1996). Appropriate critical values when testing for a single multivatiate outlier by using the mahalanobis distance. Journal of Applied Statistics, 45(1), 73-81. Quenoullie, M. (1949). Approximate tests of correlation in time series. Journal of the Royal Statistical Society B, 11, 18-44. Quinlan, J. R. (1987). Simplying decision tree. International Journal of Man- Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems, 27(3), 221-234. Quinlan, J. R. (1993). C4.5: Programs for machine learning. USA: Morgan Kaufmann Publishers. Raileanu, L. E., & Stoffel, K. (2004). Theoretical comparison between the Gini Index and information gain criteria. Annals of Mathematics and Artificial Intelligence, 41(1), 77-93. Rajendran, P., Madheswaran, M. & Naganandhini, K. (2010). An improved preprocessing technique with image mining approach for the medical image classification. Second International Conference on Computing and Networking Technologies (pp. 1-7). Reif, J. M., Goldstein, M., Stahl. A., & Breuel, T. (2008). Anomaly detection by combining decision trees and parametric densities. In ICPR 2008. IEEE, 1-4. Rokach, L., & Maimon, O. (2008). Data Mining with decision trees theory and applications (Vols. 69). Singapore: World Scientific. Rousseeuw. P. J., & Leroy, A. M. (2003). Robust regression and outlier detection. Hoboken, New Jersey: Wiley. 199 Sarkar, M., & Leong, T. Y. (2000). Application of K-nearest neighbors algorithm on breast cancer diagnosis problem. AMIA Annual Symposium Proceedings Archive (pp. 759-763). Sawilowsky, S. (1999). Increasing physicians’ awareness of the impact of statistics on research outcomes: comparative power of the t-test and Wilcoxon ranksum test in small samples applied research. Journal of Clinical Epidemiology. 52(3). 229-235. Seber, G. A. F. (1977). Linear regression analysis. Canada: John Wiley & Son. Schurmann, J. (1996). Pattern classification: A unified view of statistical and neural approaches. New York: Wiley. Shouman, M., Turner, T. & Stocker, R. (2011). Using Decision Tree for Diagnosing Heart Disease Patients. In Proc. Australasian Data Mining Conference (AusDM 11) Ballarat, Australia. CRPIT (pp. 23-29). Silva, J. E., Marques de Sá, J. P., & Jossinet, J. (2000). Classification of Breast Tissue by Electrical Impedance Spectroscopy. Med & Bio Eng & Computing, 38, 26-30. Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proc Annu Symp Comput Appl Med Care (pp. 261-265). IEEE Computer Society Press. Tabia, K. & Benferhat, S. (2008). On the use of decision trees as behavioral approaches in intrusion detection. In Seventh International Conference on Machine Learning and Applications (ICMLA’08), IEEE, 665-670. Terabe, M., Katai, O., Sawaragi, T, Washio. & Motoda, H. (1999). A data preprocessing method using association rules of attributes for improving decision tree. Methodologies for Knowledge Discovery and Data Mining Lecture Notes in Computer Science, 1574, 143-147. Thomas, L. C., Oliver, R. W., & Hand, D. J. (2005). A survey of the issues in consumer credit modeling research. Journal of the Operational Research Society, 56(9), 1006-1015. Timofeev, R. (2004). Classification and regression trees (cart) theory and applications. Master’s thesis, Humboldt University Berlin. Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical Epidemiology, 49(11), 1225-1231. 200 Valera, V. A., Walter, B. A., Yokohama, N., Koyama, Y., Liai, T., & Okamoto. H. (2006). Prognostic groups in colorectal carcinoma patients base on tumor call proliferation and classification and regression. Annals of Surgical Oncology, 14(1), 34-40. Wang, J. F., Gu, Y. S., & Wang, X. Z. (2004). Analysis of robustness about decision tree induced by insensible attribute. Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 1874-1877. Wang, M. C. & Johnson, M. E. (n.d.). Statistical decision theory in evaluating classification rules. Retrieved from http:// pegasus.cc.ucf.edu/~cwang/sta6714/Lecture6/Note/Statistical%20Decision%2 0Theory.pdf. Webb, A. (1999). Statistical pattern recognition. London: Arnold. Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic Press. Wilkinson, L. (1992). Tree Structure data analysis: AID, CHAID and CART. Paper presented at the 1992 Sun Valley, ID, Sawtooth/SYSTAT joint Software Conference. Wu, M.C., Lin, S.Y., & Lin, C.H. (2006). An effective application of decision tree to stock trading. Expert Systems with Applications, 31(2), 270–274. Xia, T., & Zhang, D. (2005). Improving the R*-tee with outlier handling techniques. Proceeding of the Annual ACM International Workshop on Geographic Information Systems (pp. 125-134). Bremen Germany: ACM Xu, M., Wang, J. L., & Chen, T. (2006). Improved decision tree: ID3. In D. S. Huang, K. Li & W. Irwin. Intelligent Computing in Signal Processing and Pattern Recognition (pp.141-149). Berlin Heidelberg: Springer. Young, F. M., Valero-Mora, P. M., & Friendly, M. (2006). Visual statistics: seeing data with dynamic interactive graphics. Hoboken, New Jersey: Wiley. Zambon, M., Lawrence, R., Bunn, A., & Powell, S. (2006). Effect of alternative splitting rules on image processing using classification tree analysis. American Society for Photogrammetry and Remote Sensing. 72(1), 25-30.

Winsorize tree algorithm for handling outliers in classification problem

مواد مشابهة