Accelerating Low Bit-Width Deep Convolution Neural Network In Mram

Keywords

In-memory computing; Magnetic Random Access Memory; Neural network acceleration

Abstract

Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.

Publication Date

8-7-2018

Publication Title

Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI

Volume

2018-July

Number of Pages

533-538

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/ISVLSI.2018.00103

Socpus ID

85052125131 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85052125131

This document is currently not available here.

Share

COinS