Title

Improving Metadata Management For Small Files In Hdfs

Abstract

Scientific applications are adapting HDFS/MapReduce to perform large scale data analytics. One of the major challenges is that an overabundance of small files is common in these applications, and HDFS manages all its files through a single server, the Namenode. It is anticipated that small files can significantly impact the performance of Namenode. In this work we propose a mechanism to store small files in HDFS efficiently and improve the space utilization for metadata. Our scheme is based on the assumption that each client is assigned a quota in the file system, for both the space and number of files. In our approach, we utilize the compression method 'harballing', provided by Hadoop, to better utilize the HDFS. We provide for new job functionality to allow for in-job archival of directories and files so that running MapReduce programs may complete without being killed by the JobTracker due to quota policies. This approach leads to better functionality of metadata operations and more efficient usage of the HDFS. Our analysis results show that we can reduce the metadata footprint in main memory by a factor of 42. © 2009 IEEE.

Publication Date

12-21-2009

Publication Title

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Number of Pages

-

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CLUSTR.2009.5289133

Socpus ID

72049093234 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/72049093234

This document is currently not available here.

Share

COinS