A quantitative analysis of loop nest locality
- 1 September 1996
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 31 (9) , 94-104
- https://doi.org/10.1145/237090.237161
Abstract
This paper analyzes and quantifies the locality characteristics of numerical loop nests in order to suggest future directions for architecture and software cache optimizations. Since most programs spend the majority of their time in nests, the vast majority of cache optimization techniques target loop nests. In contrast, the locality characteristics that drive these optimizations are usually collected across the entire application rather than the nest level. Indeed, researchers have studied numerical codes for so long that a number of commonly held assertions have emerged on their locality characteristics. In light of these assertions, we use the Perfect Benchmarks to take a new look at measuring locality on numerical codes based on references, loop nests, and program locality properties. Our results show that several popular assertions are at best overstatements. For example, we find that temporal and spatial reuse have balanced roles within a loop nest and most reuse across nests and the entire program is temporal. These results are consistent with high hit rates, but go against the commonly held assumption that spatial reuse dominates. Another result contrary to popular assumption is that misses within a nest are overwhelmingly conflict misses rather than capacity misses. Capacity misses are a significant source of misses for the entire program, but mostly correspond to potential reuse between different loop nests. Our locality measurements reveal important differences between loop nests and programs; refute some popular assertions; and provide new insights for the compiler writer and the architect.Keywords
This publication has 30 references indexed in Scilit:
- Effective hardware-based data prefetching for high-performance processorsIEEE Transactions on Computers, 1995
- Improving the ratio of memory operations to floating-point operations in loopsACM Transactions on Programming Languages and Systems, 1994
- To copy or not to copyPublished by Association for Computing Machinery (ACM) ,1993
- Second bibliography on Cache memoriesACM SIGARCH Computer Architecture News, 1991
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991
- Improving register allocation for subscripted variablesPublished by Association for Computing Machinery (ACM) ,1990
- Evaluating associativity in CPU cachesIEEE Transactions on Computers, 1989
- Strategies for cache and local memory management by global program transformationJournal of Parallel and Distributed Computing, 1988
- Bibliography and reading on CPU cache memories and related topicsACM SIGARCH Computer Architecture News, 1986
- Cache MemoriesACM Computing Surveys, 1982