Title

Combining Local And Global History For High Performance Data Prefetching

Abstract

In this paper, we present our design of a high performance prefetcher, which exploits various localities in both local cache-miss streams (misses generated from the same instruction) and the global cache-miss address stream (the misses from different instructions). Besides the stride and context localities that have been exploited in previous work, we identify new data localities and incorporate novel prefetching algorithms into our design. In this work, we also study the (largely overlooked) importance of eliminating redundant prefetches. We use logic to remove local (by the same instruction) redundant prefetches and we use a Bloom filter or miss status handling registers (MSHRs) to remove global (by all instructions) redundant prefetches. We evaluate three different design points of the proposed architecture, trading off performance for complexity and latency efficiency. Our experimental results based on a set of SPEC 2006 benchmarks show that the proposed design significantly improves the performance (over 1.6X for our highest performance design point) at a small hardware cost for various processor, cache and memory bandwidth configurations.

Publication Date

2-11-2011

Publication Title

Journal of Instruction-Level Parallelism

Volume

13

Number of Pages

-

Document Type

Article

Personal Identifier

scopus

Socpus ID

79551702603 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/79551702603

This document is currently not available here.

Share

COinS