Mrsim: Mitigating Reducer Skew In Mapreduce

Abstract

MapReduce has emerged as a popular programming model in the field of data-intensive computing. This is due to its simplistic design, which provides ease of use for programmers, and its framework implementations such as Hadoop, which have been adopted by large business and technology companies. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data assigned to each task. This causes some tasks to take much longer to finish than others and can significantly impact performance. Existing solutions for the data skew in reduce side increase the overhead that the users need to customize a novel partitioner for the specific application, or perform additional sampling processes before the map function begins. To mitigate the data skew in reduce side, which is called Reducer skew in this paper, we proposed a load balancing strategy based on load statistics, namely MRSIM. To gets the input data distribution in reduce stage, MRSIM computed the statistics while preparing data, which makes full use of the shuffle stage in MapReduce. To balance the load of entire cluster, MRSIM reallocated reduce tasks on the heavy nodes to idle ones according to the data distribution. In addition, by introducing the load feedback mechanism, MRSIM further improved the cluster's performance when running complex applications. We evaluated MRSIM in YARN (Hadoop 2.2.0), the experimental results show that our MRSIM outperformed the default strategy in native Hadoop greatly, the improvement in execution time reached 17%.

Publication Date

5-16-2017

Publication Title

Proceedings - 31st IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2017

Number of Pages

379-384

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/WAINA.2017.94

Socpus ID

85021433807 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85021433807

This document is currently not available here.

Share

COinS