Opass: Analysis And Optimization Of Parallel Data Access On Distributed File Systems
Keywords
Bipartite Matching; Distributed File Systems; Parallel Data Access
Abstract
In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. This results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize Parallel Data Access on Distributed File Systems referred to as Opass. The goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. Then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster tested and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.
Publication Date
7-17-2015
Publication Title
Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015
Number of Pages
623-632
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/IPDPS.2015.55
Copyright Status
Unknown
Socpus ID
84971475227 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84971475227
STARS Citation
Yin, Jiangling; Wang, Jun; Zhou, Jian; Lukasiewicz, Tyler; and Huang, Dan, "Opass: Analysis And Optimization Of Parallel Data Access On Distributed File Systems" (2015). Scopus Export 2015-2019. 2028.
https://stars.library.ucf.edu/scopus2015/2028