NORMA eResearch @NCI Library

Novel Data-Distribution Technique for Hadoop in Heterogeneous Cloud Environments

Ubarhande, Vrushali, Popescu, Alina Madalina and González-Vélez, Horacio (2015) Novel Data-Distribution Technique for Hadoop in Heterogeneous Cloud Environments. In: Proceedings - 2015 9th International Conference on Complex, Intelligent, and Software Intensive Systems, CISIS 2015. IEEE, pp. 217-224. ISBN 9781479988709

Full text not available from this repository.
Official URL:


The Hadoop framework has been developed to effectively process data-intensive MapReduce applications. Hadoop users specify the application computation logic in terms of a map and a reduce function, which are often termed MapReduce applications. The Hadoop distributed file system is used to store the MapReduce application data on the Hadoop cluster nodes called Data nodes, whereas Name node is a control point for all Data nodes. While its resilience is increased, its current data-distribution methodologies are not necessarily efficient for heterogeneous distributed environments such as public clouds. This work contends that existing data distribution techniques are not necessarily suitable, since the performance of Hadoop typically degrades in heterogeneous environments whenever data-distribution is not determined as per the computing capability of the nodes. The concept of data-locality and its impact on the performance of Hadoop are key factors, since they affect the performance in the Map phase when scheduling tasks. The task scheduling techniques in Hadoop should arguably consider data locality to enhance performance. Various task scheduling techniques have been analysed to understand their data-locality awareness while scheduling applications. Other system factors also play a major role while achieving high performance in Hadoop data processing. The main contribution of this work is a novel methodology for data placement for Hadoop Data nodes based on their computing ratio. Two standard MapReduce applications, Word Count and Grep, have been executed and a significant performance improvement has been observed based on our proposed data distribution technique.

Item Type: Book Section
Subjects: T Technology > T Technology (General) > Information Technology > Cloud computing
Divisions: School of Computing > Staff Research and Publications
Depositing User: Caoimhe Ní Mhaicín
Date Deposited: 07 Dec 2016 12:45
Last Modified: 07 Dec 2016 12:45

Actions (login required)

View Item View Item