Astro: Synthesizing Application-Specific Reconfigurable Hardware Traces To Exploit Memory-Level Parallelism
Keywords
Application-specific; FPGA; HLS; Memory structure; Memory-level parallelism; Reconfigurable
Abstract
Emerging integrated CPU + FPGA hybrid platforms, such as the Extensible Processing Platform architecture from Xilinx [1], offer unprecedented opportunity to achieving both multifunctionality and real-time responsiveness for memory-intensive embedded applications. However, how to cost-effectively synthesize application-specific hardware constructs that fully exploit memory-level parallelism remains to be a key challenge. To address this problem, we propose a new FPGA-based embedded computer architecture, ASTRO (Application-Specific Hardware Traces with Reconfigurable Optimization). Our main contribution is the development of an integrated methodology that focuses on how to construct an application-specific memory access network capable of extracting the maximum amount of memory-level parallelism on a per-application basis. In particular, our proposed ASTRO architecture can (1) perform dynamic memory analysis to maximally extract the target application's instruction, loop and memory-level parallelism for performance enhancement, (2) synthesize highly efficient accelerators that enable parallelized memory accesses, and therefore (3) accomplish effective data orchestration by utilizing the capabilities of modern FPGA devices: abundant distributed block RAMs and reprogrammability. To empirically validate our ASTRO methodology, we have implemented a baseline embedded processor platform, a conventional CPU + accelerator with a centralized single memory, and a prototype ASTRO machine based on Xilinx MicroBlaze technology. Our experimental results show that on average for 10 benchmark applications from SPEC2006 and MiBench [2], the ASTRO machine achieves 8.6 times speedup compared to the baseline embedded processor platform and 1.7 times speedup compared to a conventional CPU + accelerator platform. More interestingly, the ASTRO platform achieves more than 40% reduction in energy-delay product compared to a conventional CPU + accelerator with a centralized memory.
Publication Date
10-29-2015
Publication Title
Microprocessors and Microsystems
Volume
39
Issue
7
Number of Pages
553-564
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1016/j.micpro.2015.03.005
Copyright Status
Unknown
Socpus ID
84940461834 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84940461834
STARS Citation
Lin, Mingjie; Chen, Shaoyi; Demara, Ronald F.; and Wawrzynek, John, "Astro: Synthesizing Application-Specific Reconfigurable Hardware Traces To Exploit Memory-Level Parallelism" (2015). Scopus Export 2015-2019. 1004.
https://stars.library.ucf.edu/scopus2015/1004