Precise miss analysis for program transformations with caches of arbitrary associativity

1 October 1998

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGPLAN Notices

Vol. 33 (11) , 228-239
https://doi.org/10.1145/291006.291051

Abstract

Analyzing and optimizing program memory performance is a pressing problem in high-performance computer architectures. Currently, software solutions addressing the processor-memory performance gap include compiler-or programmer-applied optimizations like data structure padding, matrix blocking, and other program transformations. Compiler optimization can be effective, but the lack of precise analysis and optimization frameworks makes it impossible to confidently make optimal, rather than heuristic-based, program transformations. Imprecision is most problematic in situations where hard-to-predict cache conflicts foil heuristic approaches. Furthermore, the lack of a general framework for compiler memory performance analysis makes it impossible to understand the combined effects of several program transformations.The Cache Miss Equation (CME) framework discussed in this paper addresses these issues. We express memory reference and cache conflict behavior in terms of sets of equations. The mathematical precision of CMEs allows us to find true optimal solutions for transformations like blocking or padding. The generality of CMEs also allows us to reason about interactions between transformations applied in concert. Unlike our prior work, this framework applies to caches of arbitrary associativity. This paper also demonstrates the utility of CMEs by presenting precise algorithms for intra-variable padding, inter-variable padding, and selecting tile sizes. Our experiences with CMEs implemented in the SUIF system show that they are a unifying mathematical framework offering the generality and precision imperative for compiler optimizations on current high-performance architectures.

This publication has 18 references indexed in Scilit:

Cache miss equations
Published by Association for Computing Machinery (ACM) ,1997
A quantitative analysis of loop nest locality
Published by Association for Computing Machinery (ACM) ,1996
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems, 1996
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials
Published by Association for Computing Machinery (ACM) ,1996
Tile size selection using cache organization and data layout
Published by Association for Computing Machinery (ACM) ,1995
Cache profiling and the SPEC benchmarks: a case study
Computer, 1994
Loop Transformations for Restructuring Compilers: The Foundations
Published by Springer Nature ,1993
MemSpy: analyzing memory system bottlenecks in programs
Published by Association for Computing Machinery (ACM) ,1992
The cache performance and optimizations of blocked algorithms
Published by Association for Computing Machinery (ACM) ,1991
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems, 1987