A layout-conscious iteration space transformation technique
- 1 December 2001
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 50 (12) , 1321-1335
- https://doi.org/10.1109/tc.2001.970571
Abstract
Exploiting locality of references has become extremely important in realizing the potential performance of modern machines with deep memory hierarchies. The data access patterns of programs and the memory layouts of the accessed data sets play a critical role in determining the performance of applications running on these machines. This paper presents a cache locality optimization technique that can optimize a loop nest even if the arrays referenced have different layouts in memory. Such a capability is required for a global locality optimization framework that applies both loop and data transformations to a sequence of loop nests for optimizing locality. Our method uses a single linear algebra framework to represent both data layouts and loop transformations. It computes a nonsingular loop transformation matrix such that, in a given loop nest, data locality is exploited in the innermost loops, where it is most useful. The inverse of a nonsingular transformation matrix is built column-by-column, starting from the rightmost column. In addition, our approach can work in those cases where the data layouts of a subset of the referenced arrays is unknown; this is a key step in optimizing a sequence of loop nests and whole programs for locality. Experimental results on an SGI/Cray Origin 2000 nonuniform memory access multiprocessor machine show that our technique reduces execution times by as much as 70 percent.Keywords
This publication has 27 references indexed in Scilit:
- Non-unimodular transformations of nested loopsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Automatic partitioning of data and computations on scalable shared memory multiprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Data transformations for eliminating conflict missesPublished by Association for Computing Machinery (ACM) ,1998
- Improving data locality with loop transformationsACM Transactions on Programming Languages and Systems, 1996
- Compiler cache optimizations for banded matrix problemsPublished by Association for Computing Machinery (ACM) ,1995
- Communication-Free Hyperplane Partitioning of Nested LoopsJournal of Parallel and Distributed Computing, 1993
- Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputersIEEE Transactions on Parallel and Distributed Systems, 1992
- A loop transformation theory and an algorithm to maximize parallelismIEEE Transactions on Parallel and Distributed Systems, 1991
- Compiling communication-efficient programs for massively parallel machinesIEEE Transactions on Parallel and Distributed Systems, 1991
- More iteration space tilingPublished by Association for Computing Machinery (ACM) ,1989