Astro: Synthesizing Application-Specific Reconfigurable Hardware Traces To Exploit Memory-Level Parallelism

Keywords

Application-specific; FPGA; HLS; Memory structure; Memory-level parallelism; Reconfigurable

Abstract

Emerging integrated CPU + FPGA hybrid platforms, such as the Extensible Processing Platform architecture from Xilinx [1], offer unprecedented opportunity to achieving both multifunctionality and real-time responsiveness for memory-intensive embedded applications. However, how to cost-effectively synthesize application-specific hardware constructs that fully exploit memory-level parallelism remains to be a key challenge. To address this problem, we propose a new FPGA-based embedded computer architecture, ASTRO (Application-Specific Hardware Traces with Reconfigurable Optimization). Our main contribution is the development of an integrated methodology that focuses on how to construct an application-specific memory access network capable of extracting the maximum amount of memory-level parallelism on a per-application basis. In particular, our proposed ASTRO architecture can (1) perform dynamic memory analysis to maximally extract the target application's instruction, loop and memory-level parallelism for performance enhancement, (2) synthesize highly efficient accelerators that enable parallelized memory accesses, and therefore (3) accomplish effective data orchestration by utilizing the capabilities of modern FPGA devices: abundant distributed block RAMs and reprogrammability. To empirically validate our ASTRO methodology, we have implemented a baseline embedded processor platform, a conventional CPU + accelerator with a centralized single memory, and a prototype ASTRO machine based on Xilinx MicroBlaze technology. Our experimental results show that on average for 10 benchmark applications from SPEC2006 and MiBench [2], the ASTRO machine achieves 8.6 times speedup compared to the baseline embedded processor platform and 1.7 times speedup compared to a conventional CPU + accelerator platform. More interestingly, the ASTRO platform achieves more than 40% reduction in energy-delay product compared to a conventional CPU + accelerator with a centralized memory.

Publication Date

10-29-2015

Publication Title

Microprocessors and Microsystems

Volume

39

Issue

7

Number of Pages

553-564

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1016/j.micpro.2015.03.005

Socpus ID

84940461834 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84940461834

This document is currently not available here.

Share

COinS