Mrsim: Mitigating Reducer Skew In Mapreduce
Abstract
MapReduce has emerged as a popular programming model in the field of data-intensive computing. This is due to its simplistic design, which provides ease of use for programmers, and its framework implementations such as Hadoop, which have been adopted by large business and technology companies. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data assigned to each task. This causes some tasks to take much longer to finish than others and can significantly impact performance. Existing solutions for the data skew in reduce side increase the overhead that the users need to customize a novel partitioner for the specific application, or perform additional sampling processes before the map function begins. To mitigate the data skew in reduce side, which is called Reducer skew in this paper, we proposed a load balancing strategy based on load statistics, namely MRSIM. To gets the input data distribution in reduce stage, MRSIM computed the statistics while preparing data, which makes full use of the shuffle stage in MapReduce. To balance the load of entire cluster, MRSIM reallocated reduce tasks on the heavy nodes to idle ones according to the data distribution. In addition, by introducing the load feedback mechanism, MRSIM further improved the cluster's performance when running complex applications. We evaluated MRSIM in YARN (Hadoop 2.2.0), the experimental results show that our MRSIM outperformed the default strategy in native Hadoop greatly, the improvement in execution time reached 17%.
Publication Date
5-16-2017
Publication Title
Proceedings - 31st IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2017
Number of Pages
379-384
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/WAINA.2017.94
Copyright Status
Unknown
Socpus ID
85021433807 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85021433807
STARS Citation
Chen, Lei; Lu, Wei; Che, Xiaoping; Xing, Weiwei; and Wang, Liqiang, "Mrsim: Mitigating Reducer Skew In Mapreduce" (2017). Scopus Export 2015-2019. 7186.
https://stars.library.ucf.edu/scopus2015/7186