Optimizing Mapreduce Partitioner Using Naive Bayes Classifier

Abstract

Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the input data of reduce tasks is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important operation of MapReduce because it determines the destinations of map output and could significantly affect the data amount of shuffle. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering the two essential issues have ignored the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a novel partitioner based on naive Bayes classifier, namely, BAPM, which achieves better performance through optimizing data locality and data skew by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster with 31 nodes and the results show that BAPM speeds up the computing performance of MapReduce by up to 19.26% compared to the native Hadoop.

Publication Date

3-29-2018

Publication Title

Proceedings - 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 2017 IEEE 15th International Conference on Pervasive Intelligence and Computing, 2017 IEEE 3rd International Conference on Big Data Intelligence and Computing and 2017 IEEE Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2017

Volume

2018-January

Number of Pages

812-819

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.138

Socpus ID

85048068322 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85048068322

This document is currently not available here.

Share

COinS