Cost-sensitive ensemble decision tree algorithms for customer churn analysis

Traditionally, the major goal of most classification learning methods is to focus on improving the accuracy rate and minimizing generalization errors or misclassification rate. However, this is not the case in the business domain. Most businesses aim to minimize operation costs and maximize profits....

Full description

Saved in:
Bibliographic Details
Main Author: Wong, Keng Tuck
Format: Thesis
Published: 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditionally, the major goal of most classification learning methods is to focus on improving the accuracy rate and minimizing generalization errors or misclassification rate. However, this is not the case in the business domain. Most businesses aim to minimize operation costs and maximize profits. From the business point of view, if a prediction model is able to minimize misclassification cost compared to models that have higher accuracy, it is reasonable to accept a model with lower accuracy for predictive analytics. Cost studies are crucial in businesses to maximize savings and profits. This research work aims to propose a cost-sensitive hybrid ensemble decision tree algorithm for developing customer churn prediction models to detect attritions or churn rate in addition to minimizing financial loss caused by the churner. Two groups of methods have been proposed and developed from this research. They are the MCM methods and the FNCM methods. The MCM methods did not show any improvement on imbalanced cost data set, but after further investigation and running extra experiments, the MCM methods showed ability to improve misclassification cost and accuracy on the balanced cost data set. This research found that when the data sets are sampled in balance class, the cost for both positive and negative classes become unequal. It is either the cost of positive class is higher than negative class or the cost of negative class is higher than positive class. After analysing the data set, the average cost per example for positive churn example is higher than negative churn example, thus the MCM methods produced higher false negative cost. The FNCM methods were proposed and developed after the MCM methods did not show any improvement on the balanced cost data set. The purpose is to minimize false negative cost. In the context of customer churn prediction for telco companies, minimizing the false negative cost is more important than minimizing the false positive cost. The false positive cost does not translate to a loss in business revenue because customers who have been falsely classified as positive are actually customers who will still remain as customers. This research has found a new method to minimize the false negative cost. It has good results when benchmarked against AdaBoost.M1 using the IBM Watson Telco Customer Churn Data Set and the Microsoft Azure Machine Learning Churn Tutorial Data Set. Both data sets are example dependent cost in which the cost of example - monthly charge is specified for every example in the data set. The IBM Watson dataset consists of 7043 examples (1869 positive churns and 5174 negative churns) and has 19 attributes (16 nominal and 3 numerical) and the Microsoft Azure dataset consists of 4667 examples (651 positive churns, 4016 negative churns) and 20 attributes (4 nominal and 16 numerical).