Mrapid: An Efficient Short Job Optimizer On Hadoop
Keywords
Distributed Mode; Hadoop; MapReduce; Short Job; Uber Mode
Abstract
Data have been generated and collected at an accelerating pace. Hadoop has made analyzing large scale data much simpler to developers/analysts using commodity hardware. Interestingly, it has been shown that most Hadoop jobs have small input size and do not run for long time. For example, higher level query languages, such as Hive and Pig, would handle a complex query by breaking it into smaller adhoc ones. Although Hadoop is designed for handling complex queries with large data sets, we found that it is highly inefficient to operate at small scale data, despite a new Uber mode was introduced specifically to handle jobs with small input size. In this paper, we propose an optimized Hadoop extension called MRapid, which significantly speeds up the execution of short jobs. It is completely backward compatible to Hadoop, and imposes negligible overhead. Our experiments on Microsoft Azure public cloud show that MRapid can improve performance by up to 88% compared to the original Hadoop.
Publication Date
6-30-2017
Publication Title
Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
Number of Pages
459-468
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/IPDPS.2017.100
Copyright Status
Unknown
Socpus ID
85027726950 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85027726950
STARS Citation
Zhang, Hong; Huang, Hai; and Wang, Liqiang, "Mrapid: An Efficient Short Job Optimizer On Hadoop" (2017). Scopus Export 2015-2019. 7405.
https://stars.library.ucf.edu/scopus2015/7405