Chandanshive, Ashwini Rajaram (2017) Novel Algorithm with In-node Combiner for enhanced performance of MapReduce on Amazon EC2. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Abstract
To distribute large datasets over multiple commodity servers and to perform a parallel computation a Hadoop framework is used. A question that arises with any program is efficiency of the program and its completion time. MapReduce programming model uses the divide and conquer rule, the map (reduce) tasks consists of specific, well defined phases for data processing. However only map and reduce functions are custom and their execution time can be predicted by user. The execution time for the remaining phases is generic and totally depends on the amount of data processed by the phase and the performance of underlying Hadoop cluster. The optimization of I/O can contribute towards the better performance. Hence in this paper, we will look into such I/O bottlenecks that Hadoop framework faces and a possible solution to overcome the same. We have introduced an approach that will help to optimize I/O, the combining at a node level. This design has taken the traditional combiner to a next level wherein the number of intermediate results are reduced with the help of combiner at a node level which results in reduced network traffic between mappers and reducers.
Item Type: | Thesis (Masters) |
---|---|
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Cloud computing |
Divisions: | School of Computing > Master of Science in Cloud Computing |
Depositing User: | Caoimhe Ní Mhaicín |
Date Deposited: | 21 Nov 2017 11:50 |
Last Modified: | 21 Nov 2017 11:50 |
URI: | https://norma.ncirl.ie/id/eprint/2874 |
Actions (login required)
View Item |