An architecture for software-controlled data prefetching

Abstract
This paper describes an architecture and related compiler support for software-controlled data prefetching, a technique to hide memory latency in high-performance processors. At compile-time, FETCH instructions are inserted into the instruction-stream by the compiler, based on anticipated data references and detailed information about the memory system. At run time, a separate functional unit in the CPU, the fetch unit, interprets these instructions and initiates appropriate memory reads, Prefetched data is kept in a small, fullyassociative cache, called the fetchbufler, to reduce contention with the conventional direct-mapped cache. We also introduce a prewriteback technique that can reduce the impact of stalls due to replacement writebacks in the cache. A detailed hardware model is presented and the required compiler support is developed. Simulations based on a MIPS processor model show that this technique can dramatically reduce on-chip cache miss ratios and average observed memory latency for scientific loops at only slight cost in total memory traffic.

This publication has 7 references indexed in Scilit: