Scopus Export 2015-2019

Shmcaffe: A Distributed Deep Learning Platform With Shared Memory Buffer For Hpc Architecture

Shinyoung Ahn, Korea Advanced Institute of Science and Technology
Joongheon Kim, Chung-Ang University
Eunji Lim, Electronics and Telecommunications Research Institute
Wan Choi, Electronics and Telecommunications Research Institute
Aziz Mohaisen, University of Central Florida

Keywords

Deep learning; Distributed deep learning; Shared memory; ShmCaffe; Soft memory box

Abstract

One of the reasons behind the tremendous success of deep learning theory and applications in the recent days is advances in distributed and parallel high performance computing (HPC). This paper proposes a new distributed deep learning platform, named ShmCaffe, which utilizes remote shared memory for communication overhead reduction in massive deep neural network training parameter sharing. ShmCaffe is designed based on Soft Memory Box (SMB), a virtual shared memory framework. In the SMB framework, the remote shared memory is used as a shared buffer for asynchronous massive parameter sharing among many distributed deep learning processes. Moreover, a hybrid method that combines asynchronous and synchronous parameter sharing methods is also discussed in this paper for improving scalability. As a result, ShmCaffe is 10.1 times faster than Caffe and 2.8 times faster than Caffe-MPI for deep neural network training when Inception\-v1 is trained with 16 GPUs. We verify the convergence of the Inception\-v1 model training using ShmCaffe-A and ShmCaffe-H by varying the number of workers. Furthermore, we evaluate scalability of ShmCaffe by analyzing the computation and communication times per one iteration of deep learning training in four convolutional neural network (CNN) models.

Publication Date

7-19-2018

Publication Title

Proceedings - International Conference on Distributed Computing Systems

Volume

2018-July

Number of Pages

1118-1128

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/ICDCS.2018.00111

Copyright Status

Unknown

Socpus ID

85050977445 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85050977445

STARS Citation

Ahn, Shinyoung; Kim, Joongheon; Lim, Eunji; Choi, Wan; and Mohaisen, Aziz, "Shmcaffe: A Distributed Deep Learning Platform With Shared Memory Buffer For Hpc Architecture" (2018). Scopus Export 2015-2019. 8875.
https://stars.library.ucf.edu/scopus2015/8875

This document is currently not available here.

COinS

Scopus Export 2015-2019

Shmcaffe: A Distributed Deep Learning Platform With Shared Memory Buffer For Hpc Architecture

Keywords

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2015-2019

Shmcaffe: A Distributed Deep Learning Platform With Shared Memory Buffer For Hpc Architecture

Creator

Keywords

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect