NORMA eResearch @NCI Library

Credit Card Fraud Detection Using Conditional Tabular Generative Adversarial Networks (CT-GAN) and Supervised Machine Learning Techniques

Patil, Tushar (2021) Credit Card Fraud Detection Using Conditional Tabular Generative Adversarial Networks (CT-GAN) and Supervised Machine Learning Techniques. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (4MB) | Preview

Abstract

Credit card fraud has been a major concern for financial institutes and business stakeholders for a long time and due to its ever-increasing nature, it has been a global topic of interest for researchers. Machine learning has proved to be one of the promising approaches for the detection and prediction of frauds. Despite having various advantages, there is no ideal model to handle this task due to the various factors involved. On the other hand, the class imbalance is one of the major and frequently occurring challenges while dealing with fraud detection tasks which hamper the model performance. There are several previously explored studies combining machine learning algorithms with various data-pre-processing techniques to handle class imbalance challenges. To take this research further we have used a novel approach of combining supervised machine learning algorithms like Logistic regression, Random Forest, XGBoost with Conditional Tabular Generative Adversarial Networks (CT-GAN) for balancing skewed data by data augmentation. We have used the SelectKBest feature selection method for selecting the most significant feature for our analysis. After testing the proposed technique on our machine learning algorithms which are trained on both unbalanced and balanced data, we have observed a significant increase in model performances in terms of F1-score, recall, AUC score and Gmean. The results show that the Random Forest model outperforms other models in all terms with 100% recall value followed by XGBoost having recall of 91% after applying our proposed technique whereas Logistic Regression has shown the most significant increase in performance from 78% recall to 90% after trained on balanced data.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
H Social Sciences > HG Finance > Credit. Debt. Loans.
H Social Sciences > HV Social pathology. Social and public welfare > Criminology > Crimes and Offences > Cyber Crime
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Clara Chan
Date Deposited: 11 Dec 2021 13:08
Last Modified: 11 Dec 2021 13:08
URI: https://norma.ncirl.ie/id/eprint/5209

Actions (login required)

View Item View Item