Optimizing Mapreduce Partitioner Using Naive Bayes Classifier
Abstract
Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the input data of reduce tasks is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important operation of MapReduce because it determines the destinations of map output and could significantly affect the data amount of shuffle. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering the two essential issues have ignored the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a novel partitioner based on naive Bayes classifier, namely, BAPM, which achieves better performance through optimizing data locality and data skew by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster with 31 nodes and the results show that BAPM speeds up the computing performance of MapReduce by up to 19.26% compared to the native Hadoop.
Publication Date
3-29-2018
Publication Title
Proceedings - 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 2017 IEEE 15th International Conference on Pervasive Intelligence and Computing, 2017 IEEE 3rd International Conference on Big Data Intelligence and Computing and 2017 IEEE Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2017
Volume
2018-January
Number of Pages
812-819
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.138
Copyright Status
Unknown
Socpus ID
85048068322 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85048068322
STARS Citation
Chen, Lei; Lu, Wei; Wang, Liqiang; Bao, Ergude; and Xing, Weiwei, "Optimizing Mapreduce Partitioner Using Naive Bayes Classifier" (2018). Scopus Export 2015-2019. 10566.
https://stars.library.ucf.edu/scopus2015/10566