A Preliminary Evaluation of Cache-Miss-Initiated Prefetching Techniques in Scalable Multiprocessors

Abstract

Prefetching is an important technique for reducing the average latency of memory accesses in scalable cache-coherent multiprocessors. Aggressive prefetching can significantly reduce the number of cache misses, but may introduce bursty network and memory traffic, and increase data sharing and cache pollution. Given that we anticipate enormous increases in both network bandwidth and latency, we examine whether aggressive prefetching triggered by a miss (cache-miss-initiated prefetching) can substantially improve the running time of parallel programs. Using execution-driven simulation of parallel programs on scalable cache-coherent maching, we study the performance of three cache-miss-initiated prefetching techniques: large cache blocks, sequential prefetching, and hybrid prefetching. Large cache blocks (which fetch multiple words within a single block) and sequential prefetching (which fetches multiple consecutive blocks) are well-known prefetching strategies. Hybrid prefetching is a novel technique combining hardware and software support for stride-directed prefetching. Our simulation results show that large cache blocks rarely provide significant performance improvements; the improvement in the miss rate is often too small (or nonexistent) to offset a corresponding increase in the miss penalty. Our results also show that sequential and hybrid prefetching perform better than prefetching via large cache blocks, and that hybrid prefetching performs at least as well as sequential prefetching.

Keywords

This publication has 0 references indexed in Scilit: