NORMA eResearch @NCI Library

Efficient Real-Time Data Deduplication Techniques for Improving Data Quality in Urban Taxi Trip Streams

Prashanth, Bhuvan (2023) Efficient Real-Time Data Deduplication Techniques for Improving Data Quality in Urban Taxi Trip Streams. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (1MB) | Preview

Abstract

This research addresses the pervasive issue of data duplication in real-time urban taxi trip streams, a critical concern for transportation organizations heavily reliant on such data for decision-making. The study compares the effectiveness of the widely used ”Hash Indexing Block based Data Deduplication” (HIBD) technique with traditional methods such as Dask parallel processing. HIBD employs hash-based indexing and block-based data deduplication to efficiently identify and eliminate duplicated entries in large datasets. The objective is to enhance the accuracy and reliability of urban taxi trip data by developing and implementing an efficient real-time deduplication technique. The study demonstrates that HIBD outperforms Dask in terms of deduplication accuracy, processing time, and resource consumption. The findings emphasize the significance of HIBD in reducing overall costs for urban taxi organizations by improving storage efficiency and minimizing cloud data redundancy in a cloud environment. In conclusion, our investigation’s outcomes underscore the advantages of our preferred methodology, contributing significantly to advancing real-time data deduplication in this domain.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Mijumbi, Rashid
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Cloud computing
Divisions: School of Computing > Master of Science in Cloud Computing
Depositing User: Ciara O'Brien
Date Deposited: 10 Apr 2025 11:05
Last Modified: 10 Apr 2025 11:05
URI: https://norma.ncirl.ie/id/eprint/7405

Actions (login required)

View Item View Item