Extracting Data Parallelism In Non-Stencil Kernel Computing By Optimally Coloring Folded Memory Conflict Graph

Keywords

Graph Coloring; Graph Folding; Memory Conflict Reduction

Abstract

Irregular memory access pattern in non-stencil kernel computing renders the well-known hyperplane-[1], lattice-[2] , or tessellationbased [3] HLS techniques ineffective. We develop an elegant yet effective technique that synthesizes memory-optimal architecture from high level software code in order to maximize applicationspecific data parallelism. Our basic idea is to exploit graph structures embedded in data access pattern and computation structure in order to perform the memory banking that maximizes parallel memory accesses while conserving both hardware and energy consumption. Specifically, we priority color a weighted conflict graph generated from folding the fundamental conflict graph to maximize memory conflict reduction. Most interestingly, our graph-based methodology enables a straightforward tradeoffbetween the number of memory banks and minimizing memory conflicts. We empirically test our methodology with Vivado HLx 2015.4 on a standard Kintex-7 device for six benchmark computing kernels by measuring conflict reduction. In particular, our approach only require 9.563 LUT, 3.23 FF, 2.53 BRAM, and 11.333 DSP of the total available hardware resource to obtain a mapping function that achieves a 903 conflict reduction on a modified forward Gaussian elimination Kernel with 4 simultaneous memory accesses.

Publication Date

6-24-2018

Publication Title

Proceedings - Design Automation Conference

Volume

Part F137710

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1145/3195970.3196088

Socpus ID

85053695340 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85053695340

This document is currently not available here.

Share

COinS