Title
Mrap: A Novel Mapreduce-Based Framework To Support Hpc Analytics Applications With Access Patterns
Keywords
HPC analytics applications; HPC data access patterns; MapReduce
Abstract
Due to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. Many application scientists are looking to integrate data-intensive computing into computational-intensive High Performance Computing facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. For every MapReduce application that must be run in order to complete the desired data analysis, a distributed read and write operation on the file system must be performed. Our contribution is to extend Map-Reduce to eliminate the multiple scans and also reduce the number of pre-processing MapReduce programs. We have added additional expressiveness to the MapReduce language to allow users to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data pre-processing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. Using our augmented Map-Reduce system, MapReduce with Access Patterns (MRAP), we have demonstrated up to 33% throughput improvement in one real application, and up to 70% in an I/O kernel of another application. Copyright 2010 ACM.
Publication Date
12-16-2010
Publication Title
HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Number of Pages
107-118
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1145/1851476.1851490
Copyright Status
Unknown
Socpus ID
78649986403 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/78649986403
STARS Citation
Sehrish, Saba; MacKey, Grant; Wang, Jun; and Bent, John, "Mrap: A Novel Mapreduce-Based Framework To Support Hpc Analytics Applications With Access Patterns" (2010). Scopus Export 2010-2014. 379.
https://stars.library.ucf.edu/scopus2010/379