Precise miss analysis for program transformations with caches of arbitrary associativity
- 1 October 1998
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGPLAN Notices
- Vol. 33 (11) , 228-239
- https://doi.org/10.1145/291006.291051
Abstract
Analyzing and optimizing program memory performance is a pressing problem in high-performance computer architectures. Currently, software solutions addressing the processor-memory performance gap include compiler-or programmer-applied optimizations like data structure padding, matrix blocking, and other program transformations. Compiler optimization can be effective, but the lack of precise analysis and optimization frameworks makes it impossible to confidently make optimal, rather than heuristic-based, program transformations. Imprecision is most problematic in situations where hard-to-predict cache conflicts foil heuristic approaches. Furthermore, the lack of a general framework for compiler memory performance analysis makes it impossible to understand the combined effects of several program transformations.The Cache Miss Equation (CME) framework discussed in this paper addresses these issues. We express memory reference and cache conflict behavior in terms of sets of equations. The mathematical precision of CMEs allows us to find true optimal solutions for transformations like blocking or padding. The generality of CMEs also allows us to reason about interactions between transformations applied in concert. Unlike our prior work, this framework applies to caches of arbitrary associativity. This paper also demonstrates the utility of CMEs by presenting precise algorithms for intra-variable padding, inter-variable padding, and selecting tile sizes. Our experiences with CMEs implemented in the SUIF system show that they are a unifying mathematical framework offering the generality and precision imperative for compiler optimizations on current high-performance architectures.This publication has 18 references indexed in Scilit:
- Cache miss equationsPublished by Association for Computing Machinery (ACM) ,1997
- A quantitative analysis of loop nest localityPublished by Association for Computing Machinery (ACM) ,1996
- Improving data locality with loop transformationsACM Transactions on Programming Languages and Systems, 1996
- Counting solutions to linear and nonlinear constraints through Ehrhart polynomialsPublished by Association for Computing Machinery (ACM) ,1996
- Tile size selection using cache organization and data layoutPublished by Association for Computing Machinery (ACM) ,1995
- Cache profiling and the SPEC benchmarks: a case studyComputer, 1994
- Loop Transformations for Restructuring Compilers: The FoundationsPublished by Springer Nature ,1993
- MemSpy: analyzing memory system bottlenecks in programsPublished by Association for Computing Machinery (ACM) ,1992
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991
- Automatic translation of FORTRAN programs to vector formACM Transactions on Programming Languages and Systems, 1987