NORMA eResearch @NCI Library

Effective Anonymization of Sensitive Data in the Large-Scale Systems Using Privacy Enhancing Technology

Beeruka, Tarini (2023) Effective Anonymization of Sensitive Data in the Large-Scale Systems Using Privacy Enhancing Technology. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (1MB) | Preview

Abstract

Data Anonymization is the process of securing confidential or sensitive information by hiding or encrypting identifiers that connect a specific person to stored data. Various privacy-preserving strategies, methodologies, frameworks, and prototypes have been proposed or developed for disclosing data while preserving user privacy. Data owners, such as hospitals, financial institutions, and social networking sites, analyze data and conduct business operations on the dataset after applying anonymization techniques to protect users’ privacy. In the analysis of the data, not all columns are utilized. When analyzing data, it is discovered that sensitive columns are also included, which is sometimes unnecessary because it not only consumes more computational resources but also exposes existing personal information in the dataset to risk. Data Anonymization techniques can overcome this issue, and further encryption can be applied. The author developed an optimal anonymization tool that fetches the dataset, splitting the data based on sensitive and personal attributes, only sending essential columns for computation while adding encryption for the files for added security. For greater performance and security, the Amazon S3 bucket is used for storing and retrieving data and CryptPandas has been used for the encryption and decryption of pandas data frames. The author compares the datasets after using a data anonymization tool and the raw dataset. Execution time and memory consumption are the parameters considered in this study, the well-being of patient is shown as output after performing computation of datset. According to the results, the dataset showed a 45.5 % decrease in execution time and a 33.3% decrease in memory consumption after utilizing the data anonymization tool.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Gupta, Punit
UNSPECIFIED
Uncontrolled Keywords: Amazon S3; CryptPandas; Data analysis; Data anonymization; Sensitive data
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Cloud computing
Q Science > QA Mathematics > Computer software > Computer Security
T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Divisions: School of Computing > Master of Science in Cloud Computing
Depositing User: Tamara Malone
Date Deposited: 18 Apr 2023 14:05
Last Modified: 18 Apr 2023 14:05
URI: https://norma.ncirl.ie/id/eprint/6460

Actions (login required)

View Item View Item