NORMA eResearch @NCI Library

Mitigating Data Privacy Risks in Cloud Computing

Modak, Shreya (2023) Mitigating Data Privacy Risks in Cloud Computing. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (1MB) | Preview

Abstract

Diabetes is a common chronic illness that has a significant global impact on public health. Timely intervention and management of diabetes depend on the early and precise classification of diabetes status, whether the patient is healthy, pre-diabetic, or diabetic. Using the CDC Diabetes Health Indicators dataset, CDC Diabetes Health Indicators (2015) this research explores the field of machine learning to create a reliable diabetes categorization system. For a sensitive data, such as patients’ health records data, mitigating the data security is a major issue and on the other hand it needs to be served to the doctors and the patients. Encryption is a technique in the clouds to train machine learning models which in this case is solved as AWS sage maker offers capabilities such as industry-standard SSL/TLS protocols encryption that handles mitigating the security risks for a secure model training and deployment in the clouds Theerthagiri et al. (2022). Thus, this study investigates the performance of the distributed computing specifically parallel processing, DE duplication and compares the efficiency of various machine learning algorithms; Random “Forest, xGBoost and Tensor Flow” featuring federated learning techniques and data anonymization techniques. The dataset adopted for this study is the Centres for Disease Control and Prevention (CDC) which funds the CDC Diabetes Health Indicators dataset, which is a useful resource containing 253,680 occurrences and 21 attributes that include health and lifestyle data as well as demographics. This dataset provides a thorough understanding of the intricate connection between diabetes and lifestyle variables that is our centre goal of this study. The three machine learning algorithms that are compared in this study come with special advantages for each for categorising features according to the characteristics in the dataset. Using feature significance analysis and several decision trees, Random Forest is an ensemble approach. Neural network architecture is used by Tensor Flow, which is inspired by the structure of the human brain, to recognise complex patterns. The goal of xGBoost algorithm is to identify the best hyper planes for data separation. The aim of the study is to identify which algorithm provides the most trustworthy and accurate diabetes status categorization in the clouds. The assessment will take into account indicators like F1-score, recall, accuracy, and precision. This study advances healthcare analytics by identifying the most efficient method, which may improve early diabetes identification and individualised treatment recommendations.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Arun, Shreyas Setlur
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Cloud computing
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
R Medicine > Healthcare Industry
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Cloud Computing
Depositing User: Ciara O'Brien
Date Deposited: 09 Apr 2025 11:31
Last Modified: 09 Apr 2025 11:31
URI: https://norma.ncirl.ie/id/eprint/7392

Actions (login required)

View Item View Item