Hybrid harmony search-artificial intelligence models in credit scoring

Credit is a type of advanced lending which poses the risk of having default payments. Thus, credit scoring is important to correctly identify defaulters and non-defaulters. Statistical models are the main approaches but recently, Artificial Intelligence (AI) techniques...

Full description

Saved in:
Bibliographic Details
Main Author: Goh, Rui Ying
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/84422/1/IPM%202019%2019%20-%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Credit is a type of advanced lending which poses the risk of having default payments. Thus, credit scoring is important to correctly identify defaulters and non-defaulters. Statistical models are the main approaches but recently, Artificial Intelligence (AI) techniques have been popular due to their ability to account for flexible data patterns. Support Vector Machines (SVM) and Random Forest (RF) are the main focus in this study due to their competitiveness in the literature. This study focuses to improve three main drawbacks of both AI techniques i.e. sensitivity to hyperparameters, the black-box property and increased computational effort due to hyperparameters tuning procedure. Employment of hyperparameters tuning have been a common practice for both SVM and RF in ensuring quality performance. Instead of the conventional Grid Search (GS) and manual tuning (MT) approaches, automated tuning with metaheuristics approach (MA) have also shown to be effective in this task. Genetic Algorithm (GA) has been the dominant method and other MA being attempted recently has shown the potential of MA to perform hyperparameters tuning. To the best of our knowledge, Harmony Search (HS) has yet to be utilized with SVM and RF in this domain. To utilize the SVM credit model, features selection is conducted simultaneously with hyperparameters tuning using a HS so that the attributes can be focused down to the reduced features for explanation. For the RF credit model, a HS is hybridized with RF for hyperparameters tuning. Then, the two types of features importance computed from RF algorithm are utilized for the attributes explanation. Due to the increased computational effort from HS-SVM and HS-RF, a modified HS (MHS) hybridized with SVM and RF are proposed in this study for an effective yet efficient search. There are four main modifications of the MHS hybrid models i.e. elitism selection instead of random selection, dynamic exploration and exploitation operators following step functions instead of a static value, replacement of the bandwidth with coefficient of variations and two additional termination criteria included. To further enhance the computational efficiency, the MHS hybrid models are parallelized. The four hybrid models are evaluated by comparing with standard statistical models across three datasets i.e. German and Australian credit datasets from the public repository as well as a peer-to-peer (P2P) lending data from Lending Club (LC) website to account for different credit data patterns. The discussions are based on discriminating ability, model explainability and computational time. All the hybrid models have achieved higher discriminating ability than GS-tuned models. RF hybrid models consistently show better discriminating ability compared to other methods across the three datasets. Compared to SVM hybrids, RF hybrids achieved approximately 1% improvement in German and Australian data, and around 4% improvement in LC dataset. This study also demonstrates model explainability using reduced features for MHS-SVM and features importance for MHS-RF. It is shown that these strategies are useful to obtain initial information on the attributes. For both German and Australian datasets, reduced features and features importance have directed almost the same features as ‘important’. For LC dataset, end results shows only one attribute in common for both strategies. This is believed to be due to the different approaches of both classifiers in capturing data pattern for classification. In terms of computational time, compared to GS-tuned models and the respective HS hybrids, the proposed hybrid MHS-SVM and MHS-RF have reported time improvement of more than 50%, while the parallel computation have saved up approximately 80% of the computational time. In addition, hybrid models with MHS have reduced the computational effort yet maintaining the good discriminating ability. With the parallelization of MHS hybrid models, the computational time is effectively reduced, with RF hybrid models faster than SVM hybrid models. Although statistical models are efficient as no hyperpa- rameters tuning procedure is involved, their inferior performance compared to the AI models in this study indicates the failure to capture information from the LC dataset. In terms of model performance, explainability and computational effort, MHS-RF is the recommended credit scoring model due to its robustness in the three aspects.