Improving Mapreduce Performance By Using A New Partitioner In Yarn

Keywords

Data skew; Data transmission amount; Hadoop; Heterogeneousparallel image processing; Load balance; MapReduce

Abstract

Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence the performance of MapReduce applications. However, the Hash-Partitioner in native Hadoop does not consider them. This paper proposes a new partitioner in Yarn (Hadoop 2.6.0), namely, PIY, which adopts an innovative parallel sampling method to achieve the distribution of the intermediate data. Based on this, firstly, PIY mitigates data skew in MapReduce applications. Secondly, PIY considers the heterogeneity of the computing resource to balance the load among Reducers. Thirdly, PIY reduces the network traffic in shuffle phase by trying to retain intermediate data on those nodes who act as both mapper and reducer. Compared with the native Hadoop and some other popular strategies, PIY can reduce the execution time by 35.62% and 50.65% in homogeneous and heterogeneous cluster, respectively. We also implement PIY in parallel image processing. Compared with several existing strategies, PIY can reduce the execution time by 11.2%.

Publication Date

1-1-2017

Publication Title

Proceedings - DMSVLSS 2017: 23rd International Conference on Distributed Multimedia Systems, Visual Languages and Sentient Systems

Number of Pages

24-33

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.18293/DMSVLSS2017-002

Socpus ID

85029592551 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85029592551

This document is currently not available here.

Share

COinS