NORMA eResearch @NCI Library

Optimizing Diabetes Predictive Modeling: A Study of Data Balancing and Advanced Algorithms

Ingle, Piyush (2023) Optimizing Diabetes Predictive Modeling: A Study of Data Balancing and Advanced Algorithms. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (1MB) | Preview

Abstract

Diabetes is now seen as a chronic illness that poses a global problem since it may affect everyone. Diabetes Mellitus is another name for the disorder that interferes with how our bodies process blood sugar levels. The goal of this study is to apply two data balancing techniques – ADASYN (Adaptive synthetic sampling) and SMOTE (Synthetic Minority Over-sampling) to improve the accuracy Diabetes Prediction Models. In addition to addressing the inherent class imbalance in diabetes datasets, the study looks at how these strategies affect the prediction abilities of five conventional Machine learning algorithms, k-Nearest Neighbors (KNN), AdaBoost, Decision Tree, Logistic Regression, and Gaussian Naïve Bayes. Furthermore, the study digs into the field of deep learning through the utilization of three sophisticated algorithms: Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN). This work attempts to identify the synergies between several machine learning and deep learning algorithms and data balancing strategies for efficient diabetes prediction through a thorough comparison analysis. The findings offer insightful information about how to maximize model performance in healthcare applications and give a detailed grasp of how various predictive modelling techniques interact with data preparation techniques when it comes to diabetes diagnosis. After comparison of all the results we found that SMOTE has given better results with Decision tree as the best performer with accuracy of 82% and F1 score 0.80.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Rustam, Furqan
UNSPECIFIED
Uncontrolled Keywords: Diabetes Prediction; ADASYN; SMOTE; Machine Learning; Deep Learning
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
R Medicine > RA Public aspects of medicine > RA0421 Public health. Hygiene. Preventive Medicine
R Medicine > Healthcare Industry
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 09 May 2025 08:10
Last Modified: 09 May 2025 08:10
URI: https://norma.ncirl.ie/id/eprint/7529

Actions (login required)

View Item View Item