G-Sd: Achieving Fast Reverse Lookup Using Scalable Declustering Layout In Large-Scale File Systems
Keywords
Aistributed file system; Data layout; Group-based shifted declustering; Parallel file system
Abstract
With the increasing popularity of cloud computing, current data centers contain petabytes of data in their datacenters. This requires thousands or tens of thousands of storage nodes at a single site. Node failure in these datacenters is normal instead of a rare situation. As a result, data reliability is a great concern. In order to achieve high reliability, data recovery or node reconstruction is a must. Although extensive research works have investigated how to sustain high performance and high reliability in case of node failure at large scale, a reverse lookup problem, namely finding the list of objects for the failed node is not well-addressed. As the first step of failure recovery, this process has a direct impact to the data recovery/node reconstruction. While existing solutions use metadata traversal or data distribution reversing methods for reverse lookup, which are either time consuming or expensive, the deterministic block placement schemes can achieve fast and efficient reverse lookup easily. However, they are designed for centralized, small-scale storage architectures such as RAID etc. Due to their lacking of scalability, they cannot be directly applied in large-scale storage systems. In this paper, we propose Group-Shifted Declustering (G-SD), a deterministic data layout for multi-way replication. G-SD addresses the scalability issue of our previous Shifted Declustering layout and supports fast and efficient reverse lookup. Our mathematical proofs demonstrate that G-SD is a scalable layout that maintains a high level of data availability. We implement a prototype of G-SD and its reverse lookup function on two open source file systems: Ceph and HDFS. Large scale experiments on the Marmot cluster demonstrate that the average speed of G-SD reverse lookup is more than $5\times$ faster than the reverse lookup speed of existing schemes.
Publication Date
10-1-2018
Publication Title
IEEE Transactions on Cloud Computing
Volume
6
Issue
4
Number of Pages
1017-1030
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/TCC.2016.2586050
Copyright Status
Unknown
Socpus ID
85058265505 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85058265505
STARS Citation
Wang, Jun; Han, Dezhi; Zhang, Junyao; and Yin, Jiangling, "G-Sd: Achieving Fast Reverse Lookup Using Scalable Declustering Layout In Large-Scale File Systems" (2018). Scopus Export 2015-2019. 8504.
https://stars.library.ucf.edu/scopus2015/8504