Mrapid: An Efficient Short Job Optimizer On Hadoop

Keywords

Distributed Mode; Hadoop; MapReduce; Short Job; Uber Mode

Abstract

Data have been generated and collected at an accelerating pace. Hadoop has made analyzing large scale data much simpler to developers/analysts using commodity hardware. Interestingly, it has been shown that most Hadoop jobs have small input size and do not run for long time. For example, higher level query languages, such as Hive and Pig, would handle a complex query by breaking it into smaller adhoc ones. Although Hadoop is designed for handling complex queries with large data sets, we found that it is highly inefficient to operate at small scale data, despite a new Uber mode was introduced specifically to handle jobs with small input size. In this paper, we propose an optimized Hadoop extension called MRapid, which significantly speeds up the execution of short jobs. It is completely backward compatible to Hadoop, and imposes negligible overhead. Our experiments on Microsoft Azure public cloud show that MRapid can improve performance by up to 88% compared to the original Hadoop.

Publication Date

6-30-2017

Publication Title

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

Number of Pages

459-468

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/IPDPS.2017.100

Socpus ID

85027726950 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85027726950

This document is currently not available here.

Share

COinS