Let's study whole-program cache behaviour analytically
- 23 April 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 5 (15300897) , 175-186
- https://doi.org/10.1109/hpca.2002.995708
Abstract
Based on a new characterisation of data reuse across multiple loop nests, we preset a method, a prototyping implementation and some experimental results for analysing the cache behaviour of whole programs with regular computations. Validation against cache simulation using real codes shows the efficiency and accuracy of our method. The largest program, we have analysed, Applu from SPECfP95, has 3868 lines, 16 subroutines and 2565 references. In the case of a 32KB cache with a 32B line size, our method obtains the miss ratio with an absolute error of about 0.80% in about 128 seconds while the simulator used runs for nearly 5 hours on a 933MHz Pentium. III PC. Our method can be used to guide compiler locality optimisations and improve cache simulation performance.Keywords
This publication has 17 references indexed in Scilit:
- Compiler blockability of numerical algorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Exact analysis of the cache behavior of nested loopsPublished by Association for Computing Machinery (ACM) ,2001
- Cache miss equationsACM Transactions on Programming Languages and Systems, 1999
- A linear algebra framework for automatic determination of optimal data layoutsIEEE Transactions on Parallel and Distributed Systems, 1999
- Data transformations for eliminating conflict missesPublished by Association for Computing Machinery (ACM) ,1998
- Trace-driven memory simulationACM Computing Surveys, 1997
- Counting solutions to Presburger formulasPublished by Association for Computing Machinery (ACM) ,1994
- To copy or not to copyPublished by Association for Computing Machinery (ACM) ,1993
- A practical algorithm for exact array dependence analysisCommunications of the ACM, 1992
- Strategies for cache and local memory management by global program transformationJournal of Parallel and Distributed Computing, 1988