Classification and regression tree in classifying and predicting students' academic performance

In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error an...

全面介绍

Saved in:
书目详细资料
主要作者: Ho, Su Juih
格式: Thesis
语言:English
出版: 2013
主题:
在线阅读:http://eprints.utm.my/id/eprint/33100/5/HoSuJuihMFS2013.pdf
标签: 添加标签
没有标签, 成为第一个标记此记录!
实物特征
总结:In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error and overfitting problem. Thus this research aims to reduce its misclassification error and overfitting problem for better accuracy in prediction and classification. Based on this study, different data proportion such as re-substitution method, hold-out method and 10-fold cross validation method are used for building and evaluating the decision tree. The results are compared in terms of prediction accuracy, sensitivity and specificity as well as tree structures. Based on the results obtained, 10-fold cross validation achieves the highest prediction accuracy (least misclassification error) of 85.11%. Hence, it is selected for further overfitting analysis by conducting error rate plot and cost complexity pruning methods in order to reduce the misclassification error. From the results obtained, the final pruned tree has shown to improve the prediction accuracy (87.23%). We have identified three rules generated from the final tree to identify the relationship of the attributes. Consequently, this study indicates that application of CART algorithm by 10-fold cross validation method can produce a better accuracy in classifying and predicting students? academic performance. In addition, lecturers can use such method to identify students who perform poorly in this course so that actions can be taken to avoid more failures in this course.