Title
Overlapped Checkpointing With Hardware Assist
Abstract
We present a new approach to handling the demanding I/O workload incurred during checkpoint writes encountered in High Performance Computing. Prior efforts to improve performance have been bound by issues such as hard drive limitations, and the network. Our research surpasses this limitation by providing a method to: (1) write checkpoint data to a high-speed, non-volatile buffer, and (2) asynchronously write this data to permanent storage while resuming computation. This removes the hard drive from the critical data path because our I/O node based buffers isolate the compute nodes from the storage servers. This solution is feasible because of industry declines in cost for high-capacity, non-volatile storage technologies. Testing was conducted using a standardized HPC benchmark on a test bed cluster at Los Alamos National Laboratory. Results show a definitive speedup factor for select workloads over writing directly to a typical global parallel file system; the Panasas ActiveScale File System. © 2009 IEEE.
Publication Date
12-21-2009
Publication Title
Proceedings - IEEE International Conference on Cluster Computing, ICCC
Number of Pages
-
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/CLUSTR.2009.5289154
Copyright Status
Unknown
Socpus ID
72049111261 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/72049111261
STARS Citation
Mitchell, Christopher; Nunez, James; and Wang, Jun, "Overlapped Checkpointing With Hardware Assist" (2009). Scopus Export 2000s. 11278.
https://stars.library.ucf.edu/scopus2000/11278