NORMA eResearch @NCI Library

A Hybrid Feature Selection and Hybrid Prediction Model for Credit Risk Prediction

Shetty, Purvi Prabhakar (2021) A Hybrid Feature Selection and Hybrid Prediction Model for Credit Risk Prediction. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (950kB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (950kB) | Preview


Borrowings in the consumer financial market have increased dramatically over the last few years. As a result, the risk of loss due to borrower payment failure has increased. Credit risk mitigation is a major challenge for lending institutions such as banks. Machine learning techniques for credit scoring and default prediction are assisting financial institutions in reducing credit risk. In the consumer financial market, an accurate prediction is critical. Even minor improvements to the prediction model can help banks evade colossal losses. Data mining methodologies such as feature selection and single classifier have been applied and studied in the credit risk domain. But the effects of hybrid models are not much explored. In this study, we propose a hybrid classification model containing XGBoost, CatBoost, and Light- GBM combined using a stacked generalization technique. And a hybrid feature selection model is created using Feed Forward, Weight of Evidence(WOE), Anova, Extra trees, Random forest, and L1 feature selection. The results are combined using the voting ensemble approach. Oversampling technique SMOTE is employed to balance the datasets. Lastly, the approach is generalized using three datasets from the credit risk domain. The results show that the hybrid feature selection technique outperforms traditional methods for all three datasets and can be generalized for the Credit risk domain. The stacked model outperformed state-of-the-art for large and medium datasets with an AUC value of 96% and 87%, respectively. But for small datasets, we found single classifiers were beneficial. We were able to identify major indicators in the credit risk domain. This approach will help banks and other lending institutes to improve the performance of the credit risk models and help backup business decisions.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Credit Default Prediction; CatBoost; Deep learning; Ensemble approach; Feature Selection; LightGBM; Machine Learning; stacked generalization; voting mechanism; XGBoost
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HG Finance > Credit. Debt. Loans.
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 11 Mar 2023 11:47
Last Modified: 11 Mar 2023 11:47

Actions (login required)

View Item View Item