Improving Mapreduce Performance By Using A New Partitioner In Yarn

Keywords

Data skew; Data transmission amount; Hadoop; Heterogeneousparallel image processing; Load balance; MapReduce

Abstract

Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence the performance of MapReduce applications. However, the Hash-Partitioner in native Hadoop does not consider them. This paper proposes a new partitioner in Yarn (Hadoop 2.6.0), namely, PIY, which adopts an innovative parallel sampling method to achieve the distribution of the intermediate data. Based on this, firstly, PIY mitigates data skew in MapReduce applications. Secondly, PIY considers the heterogeneity of the computing resource to balance the load among Reducers. Thirdly, PIY reduces the network traffic in shuffle phase by trying to retain intermediate data on those nodes who act as both mapper and reducer. Compared with the native Hadoop and some other popular strategies, PIY can reduce the execution time by 35.62% and 50.65% in homogeneous and heterogeneous cluster, respectively. We also implement PIY in parallel image processing. Compared with several existing strategies, PIY can reduce the execution time by 11.2%.

Publication Date

1-1-2017

Publication Title

Proceedings - DMSVLSS 2017: 23rd International Conference on Distributed Multimedia Systems, Visual Languages and Sentient Systems

Number of Pages

24-33

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.18293/DMSVLSS2017-002

Copyright Status

Unknown

Socpus ID

85029592551 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85029592551

STARS Citation

Lu, Wei; Chen, Lei; Yuan, Haitao; Xing, Weiwei; and Wang, Liqiang, "Improving Mapreduce Performance By Using A New Partitioner In Yarn" (2017). Scopus Export 2015-2019. 7096.
https://stars.library.ucf.edu/scopus2015/7096

Scopus Export 2015-2019

Improving Mapreduce Performance By Using A New Partitioner In Yarn

Keywords

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2015-2019

Improving Mapreduce Performance By Using A New Partitioner In Yarn

Creator

Keywords

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect