"Overlapped Checkpointing With Hardware Assist" by Christopher Mitchell, James Nunez et al.

Scopus Export 2000s

Title

Overlapped Checkpointing With Hardware Assist

Creator

Christopher Mitchell, University of Central Florida
James Nunez, Los Alamos National Laboratory
Jun Wang, University of Central Florida

Abstract

We present a new approach to handling the demanding I/O workload incurred during checkpoint writes encountered in High Performance Computing. Prior efforts to improve performance have been bound by issues such as hard drive limitations, and the network. Our research surpasses this limitation by providing a method to: (1) write checkpoint data to a high-speed, non-volatile buffer, and (2) asynchronously write this data to permanent storage while resuming computation. This removes the hard drive from the critical data path because our I/O node based buffers isolate the compute nodes from the storage servers. This solution is feasible because of industry declines in cost for high-capacity, non-volatile storage technologies. Testing was conducted using a standardized HPC benchmark on a test bed cluster at Los Alamos National Laboratory. Results show a definitive speedup factor for select workloads over writing directly to a typical global parallel file system; the Panasas ActiveScale File System. © 2009 IEEE.

Publication Date

12-21-2009

Publication Title

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Number of Pages

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CLUSTR.2009.5289154

Copyright Status

Unknown

Socpus ID

72049111261 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/72049111261

STARS Citation

Mitchell, Christopher; Nunez, James; and Wang, Jun, "Overlapped Checkpointing With Hardware Assist" (2009). Scopus Export 2000s. 11278.
https://stars.library.ucf.edu/scopus2000/11278

This document is currently not available here.

COinS

Scopus Export 2000s

Title

Creator

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2000s

Title

Creator

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect