Title

A GPGPU Compiler for Memory Optimization and Parallelism Management

Authors

Authors

Y. Yang; P. Xiang; J. F. Kong;H. Y. Zhou

Comments

Authors: contact us about adding a copy of your work at STARS@ucf.edu

Abbreviated Journal Title

ACM Sigplan Not.

Keywords

Performance; Experimentation; Languages; GPGPU; Compiler; Computer Science, Software Engineering

Abstract

This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or address-offset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.2, and up to 128 times speedups over the naive versions. Another distinguishing feature of our compiler is the understandability of the optimized code, which is useful for performance analysis and algorithm refinement.

Journal Title

Acm Sigplan Notices

Volume

45

Issue/Number

6

Publication Date

1-1-2010

Document Type

Article; Proceedings Paper

Language

English

First Page

86

Last Page

97

WOS Identifier

WOS:000279357500008

ISSN

0362-1340

Share

COinS