Title

Scalable Fpga Accelerator For Deep Convolutional Neural Networks With Stochastic Streaming

Keywords

Convolutional neural network; FPGA; stochastic computing

Abstract

FPGA-based heterogeneous computing platform, due to its extreme logic reconfigurability, emerges to be a strong contender as computing fabric in modern AI. As a result, various FPGA-based accelerators for deep CNN-the key driver of modern AI-have been proposed due to their advantages of high performance, reconfigurability, and fast development round, etc. In general, the consensus among researchers is that, although FPGA-based accelerator can achieve much higher energy efficiency, its raw computing performance lags behind when compared with GPUs with similar logic density. In this paper, we develop an alternative methodology to efficiently implement CNNs with FPGAs that outperform GPUs in terms of both power consumption and performance. Our key idea is to design a scalable hardware architecture and circuit design for large-scale CNNs that leverages a stochastic-based computing principle. Specifically, there are three major performance advantages. First, all key components of our deep learning CNN are designed and implemented to compute stochastically, thus achieving excellent computing performance and energy efficiency. Second, because our proposed CNN architecture enables a stream-mode computing, all of its stages can process even the partial results from preceding stages, therefore not incurring unnecessary latency due to data dependency. Finally, our FPGA-based deep CNN also provides a superior hardware scalability when compared with conventional FPGA implementations by reducing the bandwidth requirement between layers. The results show that our proposed CNN architecture significantly outperforms all previous FPGA-based deep CNN implementation approaches. It achieves 1.58x more GOPS, 6.42x more GOPS/Slice, and 10.92x more GOPS/W when compared with state-of-The-Art CNN architecture. The top-5 accuracy of stochastic VGG-16 CNN is 86.77 percent with 18.91 fps frame rate.

Publication Date

10-1-2018

Publication Title

IEEE Transactions on Multi-Scale Computing Systems

Volume

4

Issue

4

Number of Pages

888-899

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/TMSCS.2018.2886266

Socpus ID

85058669544 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85058669544

This document is currently not available here.

Share

COinS