NORMA eResearch @NCI Library

Migration of batch log-processing job from Apache Hadoop Map-Reduce to Apache Spark in the public cloud

Vasilyev, Leonid (2015) Migration of batch log-processing job from Apache Hadoop Map-Reduce to Apache Spark in the public cloud. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (3MB) | Preview

Abstract

With the increasing adoption of cloud-based infrastructure the problem of efficient utilization of provisioned resources becomes more important, since even in a pay-as-you-go model computing resources are allocated and charged in a coarse grained way (e.g. the whole virtual machine per hour). This problem becomes major in the batch processing systems, where computational resources are organized into a cluster. Even small optimization of applications running on such systems can result in significant cost savings.

In this thesis we evaluate one such application - JournalProcessor, which is a batch log-processing job that aggregates and indexes logs containing metrics data. JournalProcessor application itself built using Apache Hadoop MapReduce engine that runs on top of Apache Hadoop YARN cluster resource manager, which is deployed onto Amazon EC2 public cloud using Amazon Elastic Map Reduce (EMR).

The research question of this thesis is the following - is it possible to migrate JournalProcessor from Apache Hadoop MapReduce data processing engine to more general data-stream oriented system - Apache Spark? Our hypothesis is that by migrating the application to Spark, the utilization of provisioned cluster resources will increase, and the running time of the job will decrease.

Our contributions include: description of the application to generate the workload (JournalProcessor), generic methodology for migrating MapReduce applications to Spark and the detailed evaluation of metrics produced by a cluster.

Item Type: Thesis (Masters)
Subjects: T Technology > T Technology (General) > Information Technology > Cloud computing
Divisions: School of Computing > Master of Science in Cloud Computing
Depositing User: Caoimhe Ní Mhaicín
Date Deposited: 12 Oct 2015 12:20
Last Modified: 05 Feb 2016 10:19
URI: https://norma.ncirl.ie/id/eprint/2061

Actions (login required)

View Item View Item