NORMA eResearch @NCI Library

Generation of synthetic examples for imbalanced tabular data

Gala, Nirav Bharat (2023) Generation of synthetic examples for imbalanced tabular data. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (3MB) | Preview

Abstract

It is a laborious process to grant a loan because it requires extensive verification and confirmation. Banks and lenders must evaluate the credit risk involved with each loan application in order to prevent defaults. Data analytics and Machine learning techniques could be used on historical data to predict loan defaults and enable loan officers to make informed decisions. Data being used in this processes is often class imbalanced where defaulters are the minority classes. Many techniques have been proposed to balance out the classes, but it is not known what technique works best for credit defaults. This report presents a detailed comparison of the most important techniques for class imbalance to develop a robust binary classifier. Generative adversarial networks (GAN) and Synthetic Minority Oversampling (SMOTE) are the two most important techniques for the generation of synthetic data. A detailed comparison between GAN and SMOTE is presented in this report. The recall metric was employed to evaluate the techniques because it represents the model’s ability to identify potential defaults. Although both techniques compared relatively well on the generation of synthetic data for loan default, we will show that SMOTE outperforms GAN in terms of recall.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Estrada, Giovani
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HG Finance > Credit. Debt. Loans.
H Social Sciences > HV Social pathology. Social and public welfare > Discrimination
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 18 May 2023 14:29
Last Modified: 18 May 2023 14:29
URI: https://norma.ncirl.ie/id/eprint/6587

Actions (login required)

View Item View Item