Hybrid harmony search-artificial intelligence models in credit scoring

Credit is a type of advanced lending which poses the risk of having default payments. Thus, credit scoring is important to correctly identify defaulters and non-defaulters. Statistical models are the main approaches but recently, Artificial Intelligence (AI) techniques...

Full description

Saved in:
Bibliographic Details
Main Author: Goh, Rui Ying
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/84422/1/IPM%202019%2019%20-%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.84422
record_format uketd_dc
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Lee, Lai Soon
topic Credit scoring systems
Consumer credit

spellingShingle Credit scoring systems
Consumer credit

Goh, Rui Ying
Hybrid harmony search-artificial intelligence models in credit scoring
description Credit is a type of advanced lending which poses the risk of having default payments. Thus, credit scoring is important to correctly identify defaulters and non-defaulters. Statistical models are the main approaches but recently, Artificial Intelligence (AI) techniques have been popular due to their ability to account for flexible data patterns. Support Vector Machines (SVM) and Random Forest (RF) are the main focus in this study due to their competitiveness in the literature. This study focuses to improve three main drawbacks of both AI techniques i.e. sensitivity to hyperparameters, the black-box property and increased computational effort due to hyperparameters tuning procedure. Employment of hyperparameters tuning have been a common practice for both SVM and RF in ensuring quality performance. Instead of the conventional Grid Search (GS) and manual tuning (MT) approaches, automated tuning with metaheuristics approach (MA) have also shown to be effective in this task. Genetic Algorithm (GA) has been the dominant method and other MA being attempted recently has shown the potential of MA to perform hyperparameters tuning. To the best of our knowledge, Harmony Search (HS) has yet to be utilized with SVM and RF in this domain. To utilize the SVM credit model, features selection is conducted simultaneously with hyperparameters tuning using a HS so that the attributes can be focused down to the reduced features for explanation. For the RF credit model, a HS is hybridized with RF for hyperparameters tuning. Then, the two types of features importance computed from RF algorithm are utilized for the attributes explanation. Due to the increased computational effort from HS-SVM and HS-RF, a modified HS (MHS) hybridized with SVM and RF are proposed in this study for an effective yet efficient search. There are four main modifications of the MHS hybrid models i.e. elitism selection instead of random selection, dynamic exploration and exploitation operators following step functions instead of a static value, replacement of the bandwidth with coefficient of variations and two additional termination criteria included. To further enhance the computational efficiency, the MHS hybrid models are parallelized. The four hybrid models are evaluated by comparing with standard statistical models across three datasets i.e. German and Australian credit datasets from the public repository as well as a peer-to-peer (P2P) lending data from Lending Club (LC) website to account for different credit data patterns. The discussions are based on discriminating ability, model explainability and computational time. All the hybrid models have achieved higher discriminating ability than GS-tuned models. RF hybrid models consistently show better discriminating ability compared to other methods across the three datasets. Compared to SVM hybrids, RF hybrids achieved approximately 1% improvement in German and Australian data, and around 4% improvement in LC dataset. This study also demonstrates model explainability using reduced features for MHS-SVM and features importance for MHS-RF. It is shown that these strategies are useful to obtain initial information on the attributes. For both German and Australian datasets, reduced features and features importance have directed almost the same features as ‘important’. For LC dataset, end results shows only one attribute in common for both strategies. This is believed to be due to the different approaches of both classifiers in capturing data pattern for classification. In terms of computational time, compared to GS-tuned models and the respective HS hybrids, the proposed hybrid MHS-SVM and MHS-RF have reported time improvement of more than 50%, while the parallel computation have saved up approximately 80% of the computational time. In addition, hybrid models with MHS have reduced the computational effort yet maintaining the good discriminating ability. With the parallelization of MHS hybrid models, the computational time is effectively reduced, with RF hybrid models faster than SVM hybrid models. Although statistical models are efficient as no hyperpa- rameters tuning procedure is involved, their inferior performance compared to the AI models in this study indicates the failure to capture information from the LC dataset. In terms of model performance, explainability and computational effort, MHS-RF is the recommended credit scoring model due to its robustness in the three aspects.
format Thesis
qualification_level Master's degree
author Goh, Rui Ying
author_facet Goh, Rui Ying
author_sort Goh, Rui Ying
title Hybrid harmony search-artificial intelligence models in credit scoring
title_short Hybrid harmony search-artificial intelligence models in credit scoring
title_full Hybrid harmony search-artificial intelligence models in credit scoring
title_fullStr Hybrid harmony search-artificial intelligence models in credit scoring
title_full_unstemmed Hybrid harmony search-artificial intelligence models in credit scoring
title_sort hybrid harmony search-artificial intelligence models in credit scoring
granting_institution Universiti Putra Malaysia
publishDate 2019
url http://psasir.upm.edu.my/id/eprint/84422/1/IPM%202019%2019%20-%20IR.pdf
_version_ 1747813469369925632
spelling my-upm-ir.844222021-02-02T01:39:20Z Hybrid harmony search-artificial intelligence models in credit scoring 2019-09 Goh, Rui Ying Credit is a type of advanced lending which poses the risk of having default payments. Thus, credit scoring is important to correctly identify defaulters and non-defaulters. Statistical models are the main approaches but recently, Artificial Intelligence (AI) techniques have been popular due to their ability to account for flexible data patterns. Support Vector Machines (SVM) and Random Forest (RF) are the main focus in this study due to their competitiveness in the literature. This study focuses to improve three main drawbacks of both AI techniques i.e. sensitivity to hyperparameters, the black-box property and increased computational effort due to hyperparameters tuning procedure. Employment of hyperparameters tuning have been a common practice for both SVM and RF in ensuring quality performance. Instead of the conventional Grid Search (GS) and manual tuning (MT) approaches, automated tuning with metaheuristics approach (MA) have also shown to be effective in this task. Genetic Algorithm (GA) has been the dominant method and other MA being attempted recently has shown the potential of MA to perform hyperparameters tuning. To the best of our knowledge, Harmony Search (HS) has yet to be utilized with SVM and RF in this domain. To utilize the SVM credit model, features selection is conducted simultaneously with hyperparameters tuning using a HS so that the attributes can be focused down to the reduced features for explanation. For the RF credit model, a HS is hybridized with RF for hyperparameters tuning. Then, the two types of features importance computed from RF algorithm are utilized for the attributes explanation. Due to the increased computational effort from HS-SVM and HS-RF, a modified HS (MHS) hybridized with SVM and RF are proposed in this study for an effective yet efficient search. There are four main modifications of the MHS hybrid models i.e. elitism selection instead of random selection, dynamic exploration and exploitation operators following step functions instead of a static value, replacement of the bandwidth with coefficient of variations and two additional termination criteria included. To further enhance the computational efficiency, the MHS hybrid models are parallelized. The four hybrid models are evaluated by comparing with standard statistical models across three datasets i.e. German and Australian credit datasets from the public repository as well as a peer-to-peer (P2P) lending data from Lending Club (LC) website to account for different credit data patterns. The discussions are based on discriminating ability, model explainability and computational time. All the hybrid models have achieved higher discriminating ability than GS-tuned models. RF hybrid models consistently show better discriminating ability compared to other methods across the three datasets. Compared to SVM hybrids, RF hybrids achieved approximately 1% improvement in German and Australian data, and around 4% improvement in LC dataset. This study also demonstrates model explainability using reduced features for MHS-SVM and features importance for MHS-RF. It is shown that these strategies are useful to obtain initial information on the attributes. For both German and Australian datasets, reduced features and features importance have directed almost the same features as ‘important’. For LC dataset, end results shows only one attribute in common for both strategies. This is believed to be due to the different approaches of both classifiers in capturing data pattern for classification. In terms of computational time, compared to GS-tuned models and the respective HS hybrids, the proposed hybrid MHS-SVM and MHS-RF have reported time improvement of more than 50%, while the parallel computation have saved up approximately 80% of the computational time. In addition, hybrid models with MHS have reduced the computational effort yet maintaining the good discriminating ability. With the parallelization of MHS hybrid models, the computational time is effectively reduced, with RF hybrid models faster than SVM hybrid models. Although statistical models are efficient as no hyperpa- rameters tuning procedure is involved, their inferior performance compared to the AI models in this study indicates the failure to capture information from the LC dataset. In terms of model performance, explainability and computational effort, MHS-RF is the recommended credit scoring model due to its robustness in the three aspects. Credit scoring systems Consumer credit 2019-09 Thesis http://psasir.upm.edu.my/id/eprint/84422/ http://psasir.upm.edu.my/id/eprint/84422/1/IPM%202019%2019%20-%20IR.pdf text en public masters Universiti Putra Malaysia Credit scoring systems Consumer credit Lee, Lai Soon