Yelmar, Urmila Shridhar (2025) Smart Data Masking using AI in Banking Transactions. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (523kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (370kB) | Preview |
Abstract
The accelerated digitalisation of the banking sphere has complicated the task of maintaining the privacy of the customer without deteriorating the effectiveness of the fraud and intrusion detection tools. The thesis presents the context-sensitive, AI-assisted adaptive data masking framework that combines real-time risk evaluation and SHAP-based feature prioritisation with denoising autoencoder-based synthetic reconstruction. The aim is to ensure that the predictive performance is maintained and minimise the threat of re-identification to a large extent. Success is an area under the curve (AUC) of at least 0.95 on the task of detecting fraud, with the membership inference attack (MIA) accuracy at least 50% lower than baseline, on average over a bootstrap run.
The framework is tested against three datasets in different financial and cybersecurity contexts IEEE-CIS Fraud Detection, PaySim mobile transactions, and CICIDS2017 network intrusion traces where the train-test split is sealed before computation of SHAP to avoid label leakage. The sensitivity-tiered masking rules directly associate SHAP importance thresholds to masking actions and make them reproducible. Privacy is measured in terms of black-box MIAs and shadow models, k-anonymity scores as well as Kolmogorov Smirnov (KS) statistical tests; utility is gauged in terms of accuracy, precision, recall, F1-score, and AUC.
Results indicate that the suggested technique will cause a drop in average MIA accuracy on the masked data to about 46% (p < 0.05) when the accuracy on the unmasked data was about 90 percent, with the k-anonymity raising to at least 15. Simultaneously, fraud/intrusion detection models achieve 85 to 88 percent accuracy and nearly perfect precision and recall, exceeding zero-masking and random masking baselines, which lose more utility and cause less privacy gain. Distributional tests also bear out that reconstructed values are not identical to original (KS p < 0.01), which reduces risk of leakage at the expense of model interpretability.
The study presents a transparent, operationally viable, and empirically verified method of privacy-preserving machine learning in financial sector, in which regulatory compliance, explainability, and adversarial robustness can be balanced without a loss of predictive utility.
Actions (login required)
![]() |
View Item |
Tools
Tools