Accelerating Low Bit-Width Deep Convolution Neural Network In Mram
Keywords
In-memory computing; Magnetic Random Access Memory; Neural network acceleration
Abstract
Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.
Publication Date
8-7-2018
Publication Title
Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
Volume
2018-July
Number of Pages
533-538
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/ISVLSI.2018.00103
Copyright Status
Unknown
Socpus ID
85052125131 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85052125131
STARS Citation
He, Zhezhi; Angizi, Shaahin; and Fan, Deliang, "Accelerating Low Bit-Width Deep Convolution Neural Network In Mram" (2018). Scopus Export 2015-2019. 10117.
https://stars.library.ucf.edu/scopus2015/10117