Madan, Shilpi (2022) Systemization and Evaluation for Data deduplication by deploying competitive chunking algorithm in polymorphic thread environment and avant-garde hashing techniques. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration manual)
Download (1MB) | Preview |
Abstract
From the past 3 decades the entire world is rapidly transitioning from the traditional analogue methodologies by adapting to more digital technologies thus marking the “Great Digital Revolution”. Vast amount of data is created in several forms from digital footprints of consumers to development and research of new technologies across the globe. It is estimated that by 2025 almost 400+ exabytes of data will be generated globally every day. However efficient storage, management, and processing of this size of data can be a strenuous task for even larger corporations. There are several roadblocks towards attaining the efficient storage and utilization of data. “Data Deduplication” is one of the most efficient methods implemented to improve the storage abilities. This technique helps in identification and eradication of duplicate data. Due to the ability of identification of large levels of redundancy, ‘CDC’ better known as content-defined chunking is the key aspect of “Data Deduplication Systems”. In this research paper we are focusing on optimization of “Data Deduplication systems, by analysis and updating of the current CDC parameters which will further enable efficient identification of chunk cut-points and fingerprint the dataset by applying a Novel- Hash function. In this research paper will also introduce the multi-threading content-defined chunking algorithm to enhance the computational process using multiprocessor technique. The given algorithm works on the concept of shifting window that slides one byte at a time in case there is no match with the hash value pool. Verifying the same on the AWS cloud infrastructure using different datasets and evaluating the average time for processing a file using parallel environment versus serial method. According on the findings of the research, our technique reduces execution time and the storage efficiency is increased by 70%.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Mijumbi, Rashid UNSPECIFIED |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Cloud computing Q Science > QA Mathematics > Algebra > Algorithms > Computer algorithms |
Divisions: | School of Computing > Master of Science in Cloud Computing |
Depositing User: | Tamara Malone |
Date Deposited: | 18 Apr 2023 18:24 |
Last Modified: | 18 Apr 2023 18:24 |
URI: | https://norma.ncirl.ie/id/eprint/6475 |
Actions (login required)
View Item |