Scalable Fpga Accelerator For Deep Convolutional Neural Networks With Stochastic Streaming
Keywords
Convolutional neural network; FPGA; stochastic computing
Abstract
FPGA-based heterogeneous computing platform, due to its extreme logic reconfigurability, emerges to be a strong contender as computing fabric in modern AI. As a result, various FPGA-based accelerators for deep CNN-the key driver of modern AI-have been proposed due to their advantages of high performance, reconfigurability, and fast development round, etc. In general, the consensus among researchers is that, although FPGA-based accelerator can achieve much higher energy efficiency, its raw computing performance lags behind when compared with GPUs with similar logic density. In this paper, we develop an alternative methodology to efficiently implement CNNs with FPGAs that outperform GPUs in terms of both power consumption and performance. Our key idea is to design a scalable hardware architecture and circuit design for large-scale CNNs that leverages a stochastic-based computing principle. Specifically, there are three major performance advantages. First, all key components of our deep learning CNN are designed and implemented to compute stochastically, thus achieving excellent computing performance and energy efficiency. Second, because our proposed CNN architecture enables a stream-mode computing, all of its stages can process even the partial results from preceding stages, therefore not incurring unnecessary latency due to data dependency. Finally, our FPGA-based deep CNN also provides a superior hardware scalability when compared with conventional FPGA implementations by reducing the bandwidth requirement between layers. The results show that our proposed CNN architecture significantly outperforms all previous FPGA-based deep CNN implementation approaches. It achieves 1.58x more GOPS, 6.42x more GOPS/Slice, and 10.92x more GOPS/W when compared with state-of-The-Art CNN architecture. The top-5 accuracy of stochastic VGG-16 CNN is 86.77 percent with 18.91 fps frame rate.
Publication Date
10-1-2018
Publication Title
IEEE Transactions on Multi-Scale Computing Systems
Volume
4
Issue
4
Number of Pages
888-899
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/TMSCS.2018.2886266
Copyright Status
Unknown
Socpus ID
85058669544 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85058669544
STARS Citation
Alawad, Mohammed and Lin, Mingjie, "Scalable Fpga Accelerator For Deep Convolutional Neural Networks With Stochastic Streaming" (2018). Scopus Export 2015-2019. 10404.
https://stars.library.ucf.edu/scopus2015/10404