Ali, Mohsin Sadaqat (2025) Enhancing Credit Risk Assessment using Interpretable Machine Learning Techniques. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (927kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (838kB) | Preview |
Abstract
The ability to predict credit risk is one of the most vital tasks in the financial sector; it allows lending companies to assess the likelihood that a borrower defaults on a loan. Traditional machine learning classifiers are commonly used with this purpose and however they are mostly faced with the problem of dealing with skewed data sets and have no interpretability thus making the decision process difficult for financial institutions. In this study we consider the use of ensemble classifiers and the Synthetic Minority Oversampling Edited Nearest Neighbor (SMOTE-ENN) to predict credit risk whereby the aim is to improve classification of different credit risk classes. The family of ensemble classifiers includes Random Forest, adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM). The study addresses the issue of class imbalance using ensemble classifiers and the SMOTE-ENN technique with the addition of Shapley Additive Explanations (SHAP) as interpretability of models. The results of this experiment showed that the suggested technique improved the classification in a given task. The identification of defaulters has been done by carrying out a comparative analysis of the optimised supervised learning techniques which involve; Random forest; Extreme Gradient Boosting; Support Vector Machine and Logistic Regression. The dimensionality reduction, using Recursive Feature Elimination together with Cross-Validation and Principal Component Analysis has been applied. Each model has been evaluated with the help of such metrics as the F1 score, the AUC score, prediction accuracy, precision, and recall. The combination of the calibrated Support Vector machine with Recursive Feature Elimination and Cross-Validation has implied essential potential with regards to loan defaulters identification. The proposed technique could assist the financial institution in the exact identification of loan defaulters and avoidance of any further losses. Ensemble models offer certain advantages, as they have better predictions and better stability, which makes them specifically fit within this application. The combination of predictions in different models is usually less volatile than the result of using a single model and outperforms the use of other types of models and techniques like XGBoost, SVMs, and logistic regression.
| Item Type: | Thesis (Masters) |
|---|---|
| Supervisors: | Name Email Garg, Mohit UNSPECIFIED |
| Subjects: | Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence H Social Sciences > HG Finance > Credit. Debt. Loans. Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
| Divisions: | School of Computing > Master of Science in Artificial Intelligence for Business |
| Depositing User: | Ciara O'Brien |
| Date Deposited: | 24 Jun 2026 11:04 |
| Last Modified: | 24 Jun 2026 11:04 |
| URI: | https://norma.ncirl.ie/id/eprint/9396 |
Actions (login required)
![]() |
View Item |
Tools
Tools