Title
Combining Local And Global History For High Performance Data Prefetching
Abstract
In this paper, we present our design of a high performance prefetcher, which exploits various localities in both local cache-miss streams (misses generated from the same instruction) and the global cache-miss address stream (the misses from different instructions). Besides the stride and context localities that have been exploited in previous work, we identify new data localities and incorporate novel prefetching algorithms into our design. In this work, we also study the (largely overlooked) importance of eliminating redundant prefetches. We use logic to remove local (by the same instruction) redundant prefetches and we use a Bloom filter or miss status handling registers (MSHRs) to remove global (by all instructions) redundant prefetches. We evaluate three different design points of the proposed architecture, trading off performance for complexity and latency efficiency. Our experimental results based on a set of SPEC 2006 benchmarks show that the proposed design significantly improves the performance (over 1.6X for our highest performance design point) at a small hardware cost for various processor, cache and memory bandwidth configurations.
Publication Date
2-11-2011
Publication Title
Journal of Instruction-Level Parallelism
Volume
13
Number of Pages
-
Document Type
Article
Personal Identifier
scopus
Copyright Status
Unknown
Socpus ID
79551702603 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/79551702603
STARS Citation
Dimitrov, Martin and Zhou, Huiyang, "Combining Local And Global History For High Performance Data Prefetching" (2011). Scopus Export 2010-2014. 3314.
https://stars.library.ucf.edu/scopus2010/3314