Let's study whole-program cache behaviour analytically

23 April 2004

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 5 (15300897) , 175-186
https://doi.org/10.1109/hpca.2002.995708

Abstract

Based on a new characterisation of data reuse across multiple loop nests, we preset a method, a prototyping implementation and some experimental results for analysing the cache behaviour of whole programs with regular computations. Validation against cache simulation using real codes shows the efficiency and accuracy of our method. The largest program, we have analysed, Applu from SPECfP95, has 3868 lines, 16 subroutines and 2565 references. In the case of a 32KB cache with a 32B line size, our method obtains the miss ratio with an absolute error of about 0.80% in about 128 seconds while the simulator used runs for nearly 5 hours on a 933MHz Pentium. III PC. Our method can be used to guide compiler locality optimisations and improve cache simulation performance.

Keywords

This publication has 17 references indexed in Scilit:

Compiler blockability of numerical algorithms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Exact analysis of the cache behavior of nested loops
Published by Association for Computing Machinery (ACM) ,2001
Cache miss equations
ACM Transactions on Programming Languages and Systems, 1999
A linear algebra framework for automatic determination of optimal data layouts
IEEE Transactions on Parallel and Distributed Systems, 1999
Data transformations for eliminating conflict misses
Published by Association for Computing Machinery (ACM) ,1998
Trace-driven memory simulation
ACM Computing Surveys, 1997
Counting solutions to Presburger formulas
Published by Association for Computing Machinery (ACM) ,1994
To copy or not to copy
Published by Association for Computing Machinery (ACM) ,1993
A practical algorithm for exact array dependence analysis
Communications of the ACM, 1992
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing, 1988