Improving Mapreduce Performance By Using A New Partitioner In Yarn
Keywords
Data skew; Data transmission amount; Hadoop; Heterogeneousparallel image processing; Load balance; MapReduce
Abstract
Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence the performance of MapReduce applications. However, the Hash-Partitioner in native Hadoop does not consider them. This paper proposes a new partitioner in Yarn (Hadoop 2.6.0), namely, PIY, which adopts an innovative parallel sampling method to achieve the distribution of the intermediate data. Based on this, firstly, PIY mitigates data skew in MapReduce applications. Secondly, PIY considers the heterogeneity of the computing resource to balance the load among Reducers. Thirdly, PIY reduces the network traffic in shuffle phase by trying to retain intermediate data on those nodes who act as both mapper and reducer. Compared with the native Hadoop and some other popular strategies, PIY can reduce the execution time by 35.62% and 50.65% in homogeneous and heterogeneous cluster, respectively. We also implement PIY in parallel image processing. Compared with several existing strategies, PIY can reduce the execution time by 11.2%.
Publication Date
1-1-2017
Publication Title
Proceedings - DMSVLSS 2017: 23rd International Conference on Distributed Multimedia Systems, Visual Languages and Sentient Systems
Number of Pages
24-33
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.18293/DMSVLSS2017-002
Copyright Status
Unknown
Socpus ID
85029592551 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85029592551
STARS Citation
Lu, Wei; Chen, Lei; Yuan, Haitao; Xing, Weiwei; and Wang, Liqiang, "Improving Mapreduce Performance By Using A New Partitioner In Yarn" (2017). Scopus Export 2015-2019. 7096.
https://stars.library.ucf.edu/scopus2015/7096