With the increasing popularity of cloud computing and "Big data" applications, current data centers are often required to manage petabytes or exabytes of data. To store this huge amount of data, thousands or tens of thousands storage nodes are required at a single site. This imposes three major challenges for storage system designers: (1) Reliability---node failure in these datacenters is a normal occurrence rather than a rare situation. This makes data reliability a great concern. (2) Energy efficiency---a data center can consume up to 100 times more energy than a standard office building. More than 10% of this energy consumption can be attributed to storage systems. Thus, reducing the energy consumption of the storage system is key to reducing the overall consumption of the data center. (3) Scalability---with the continuously increasing size of data, maintaining the scalability of the storage systems is essential. That is, the expansion of the storage system should be completed efficiently and without limitations on the total number of storage nodes or performance. This thesis proposes three ways to improve the above three key features for current large-scale storage systems. Firstly, we define the problem of "reverse lookup", namely finding the list of objects (blocks) for a failed node. As the first step of failure recovery, this process is directly related to the recovery/reconstruction time. While existing solutions use metadata traversal or data distribution reversing methods for reverse lookup, which are either time consuming or expensive, a deterministic block placement can achieve fast and efficient reverse lookup. However, the deterministic placement solutions are designed for centralized, small-scale storage architectures such as RAID etc.. Due to their lacking of scalability, they cannot be directly applied in large-scale storage systems. In this paper, we propose Group-Shifted Declustering (G-SD), a deterministic data layout for multi-way replication. G-SD addresses the scalability issue of our previous Shifted Declustering layout and supports fast and efficient reverse lookup. Secondly, we define a problem: "how to balance the performance, energy, and recovery in degradation mode for an energy efficient storage system?". While extensive researches have been proposed to tradeoff performance for energy efficiency under normal mode, the system enters degradation mode when node failure occurs, in which node reconstruction is initiated. This very process requires a number of disks to be spun up and requires a substantial amount of I/O bandwidth, which will not only compromise energy efficiency but also performance. Without considering the I/O bandwidth contention between recovery and performance, we find that the current energy proportional solutions cannot answer this question accurately. This thesis present PERP, a mathematical model to minimize the energy consumption for a storage systems with respect to performance and recovery. PERP answers this problem by providing the accurate number of nodes and the assigned recovery bandwidth at each time frame. Thirdly, current distributed file systems such as Google File System(GFS) and Hadoop Distributed File System (HDFS), employ a pseudo-random method for replica distribution and a centralized lookup table (block map) to record all replica locations. This lookup table requires a large amount of memory and consumes a considerable amount of CPU/network resources on the metadata server. With the booming size of "Big Data", the metadata server becomes a scalability and performance bottleneck. While current approaches such as HDFS Federation attempt to "horizontally" extend scalability by allowing multiple metadata servers, we believe a more promising optimization option is to "vertically" scale up each metadata server. We propose Deister, a novel block management scheme that builds on top of a deterministic declustering distribution method Intersected Shifted Declustering (ISD). Thus both replica distribution and location lookup can be achieved without a centralized lookup table.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Campus-only Access)
Zhang, Junyao, "Research on Improving Reliability, Energy Efficiency and Scalability in Distributed and Parallel File Systems" (2015). Electronic Theses and Dissertations. 5010.