Sideio: A Side I/O System Framework For Hybrid Scientific Workflow

Keywords

Data migration; Data-intensive; HDFS; HPC; MPI; Scientific workflow

Abstract

Recent years have seen an increasing number of Hybrid Scientific Applications. They often consist of one HPC simulation program along with its corresponding data analytics programs. Unfortunately, current computing platform settings do not accommodate this emerging workflow very well, especially write-once-read-many workflows. This is mainly because HPC simulation programs store output data into a dedicated storage cluster equipped with Parallel File System(PFS). To perform analytics on data generated by simulation, data has to be migrated from storage cluster to compute cluster. This data migration could introduce severe delay which is especially true given an ever-increasing data size. To solve the data migration problem in small-medium sized HPC clusters, we propose to construct a sided I/O path, named as SideIO, to explicitly direct analysis data to data-intensive file systems (DIFS in brief) that co-locates computation with data. In contrast, checkpoint data may not be read back later, it is written to the dedicated PFS to maximize I/O throughput. There are three components in SideIO. An I/O splitter separates simulation outputs to different storage systems (PFS or DIFS); an I/O middle-ware component allows original HPC simulation programs to execute direct I/O operations over DIFS without any porting effort and an I/O scheduler dynamically smooths out both disk write and read traffic for both simulation and analysis programs. By experimenting with two real-world scientific workflows over a 46-node SideIO prototype, we found that SideIO is able to achieve comparable read/write I/O performance in small-medium sized HPC clusters equipped with PFS. More importantly, since SideIO completely avoids the most expensive data movement overhead, it achieves up to 3x speedups for hybrid scientific workflow applications compared with current solutions.

Publication Date

10-1-2017

Publication Title

Journal of Parallel and Distributed Computing

Volume

108

Number of Pages

45-58

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1016/j.jpdc.2016.07.001

Socpus ID

84994360868 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84994360868

This document is currently not available here.

Share

COinS