Improving cache locality by a combination of loop and data transformations
- 1 February 1999
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 48 (2) , 159-167
- https://doi.org/10.1109/12.752657
Abstract
Exploiting locality of reference is key to realizing high levels of performance on modern processors. This paper describes a compiler algorithm for optimizing cache locality in scientific codes on uniprocessor and multiprocessor machines. A distinctive characteristic of our algorithm is that it considers loop and data layout transformations in a unified framework. Our approach is very effective at reducing cache misses and can optimize some nests for which optimization techniques based on loop transformations alone are not successful. An important special case is one in which data layouts of some arrays are fixed and cannot be changed. We show how our algorithm can accommodate this case and demonstrate how it can be used to optimize multiple loop nests. Experiments on several benchmarks show that the techniques presented in this paper result in substantial improvement in cache performance.Keywords
This publication has 15 references indexed in Scilit:
- A survey of software solutions for maintenance of cache consistency in shared memory multiprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Compiler blockability of dense matrix factorizationsACM Transactions on Mathematical Software, 1997
- Data-centric multi-level blockingPublished by Association for Computing Machinery (ACM) ,1997
- Static locality analysis for cache managementPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1997
- A compiler algorithm for optimizing locality in loop nestsPublished by Association for Computing Machinery (ACM) ,1997
- Improving data locality with loop transformationsACM Transactions on Programming Languages and Systems, 1996
- SUIFACM SIGPLAN Notices, 1994
- False sharing and spatial locality in multiprocessor cachesIEEE Transactions on Computers, 1994
- A data locality optimizing algorithmPublished by Association for Computing Machinery (ACM) ,1991
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991