Title

Datanet: A Data Distribution-Aware Method For Sub-Dataset Analysis On Distributed File Systems

Abstract

In this paper, we study the problem of sub-datasetanalysis over distributed file systems, e.g, the Hadoop file system. Our experiments show that the sub-datasets' distributionover HDFS blocks can often cause the corresponding analysisto suffer from a seriously imbalanced parallel execution. Thisis because the locality of individual sub-datasets is hidden bythe Hadoop file system and the content clustering of sub-datasets results in some computational nodes carrying outmuch more workload than others. We conduct a comprehensiveanalysis on how the imbalanced computing patterns occur andtheir sensitivity to the size of a cluster. We then propose anovel method to optimize sub-dataset analysis over distributedstorage systems referred to as DataNet. DataNet aims toachieve distribution-aware and workload-balanced computingand consists of the following three parts. Firstly, we proposean efficient algorithm with linear complexity to obtain themeta-data of sub-dataset distributions. Secondly, we designan elastic storage structure called ElasticMap based on theHashMap and BloomFilter techniques to store the meta-data. Thirdly, we employ a distribution-aware algorithm for sub-dataset applications to achieve a workload-balance in parallelexecution. Our proposed method can benefit different sub-dataset analyses with various computational requirements. Experiments are conducted on PRObEs Marmot 128-nodecluster testbed and the results show the performance benefitsof DataNet.

Publication Date

7-18-2016

Publication Title

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Number of Pages

504-513

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/IPDPS.2016.33

Socpus ID

84983247120 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84983247120

This document is currently not available here.

Share

COinS