NORMA eResearch @NCI Library

Systemization and Evaluation for Data deduplication by deploying competitive chunking algorithm in polymorphic thread environment and avant-garde hashing techniques

Madan, Shilpi (2022) Systemization and Evaluation for Data deduplication by deploying competitive chunking algorithm in polymorphic thread environment and avant-garde hashing techniques. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (1MB) | Preview

Abstract

From the past 3 decades the entire world is rapidly transitioning from the traditional analogue methodologies by adapting to more digital technologies thus marking the “Great Digital Revolution”. Vast amount of data is created in several forms from digital footprints of consumers to development and research of new technologies across the globe. It is estimated that by 2025 almost 400+ exabytes of data will be generated globally every day. However efficient storage, management, and processing of this size of data can be a strenuous task for even larger corporations. There are several roadblocks towards attaining the efficient storage and utilization of data. “Data Deduplication” is one of the most efficient methods implemented to improve the storage abilities. This technique helps in identification and eradication of duplicate data. Due to the ability of identification of large levels of redundancy, ‘CDC’ better known as content-defined chunking is the key aspect of “Data Deduplication Systems”. In this research paper we are focusing on optimization of “Data Deduplication systems, by analysis and updating of the current CDC parameters which will further enable efficient identification of chunk cut-points and fingerprint the dataset by applying a Novel- Hash function. In this research paper will also introduce the multi-threading content-defined chunking algorithm to enhance the computational process using multiprocessor technique. The given algorithm works on the concept of shifting window that slides one byte at a time in case there is no match with the hash value pool. Verifying the same on the AWS cloud infrastructure using different datasets and evaluating the average time for processing a file using parallel environment versus serial method. According on the findings of the research, our technique reduces execution time and the storage efficiency is increased by 70%.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Mijumbi, Rashid
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Cloud computing
Q Science > QA Mathematics > Algebra > Algorithms > Computer algorithms
Divisions: School of Computing > Master of Science in Cloud Computing
Depositing User: Tamara Malone
Date Deposited: 18 Apr 2023 18:24
Last Modified: 18 Apr 2023 18:24
URI: https://norma.ncirl.ie/id/eprint/6475

Actions (login required)

View Item View Item