Access normalization
- 1 September 1992
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 11 (4) , 285-295
- https://doi.org/10.1145/143365.143541
Abstract
In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must be made, it is usually more efficient to use block transfers of data rather than to use many small messages. To run well on such machines, software must exploit these features. We believe it is too onerous for a pro- grammer to do this by hand, so we have been exploring the use of restructuring compiler technology for this purpose. In this paper, we start with a language like FORTRAN-D with user-specified data distributionanddevelopa systematic loop transformation strategy called access normalization that re- structures loop nests to exploit locality and block transfers. We demonstrate the power of our techniques using routines from the BLAS (Basic Linear Algebra Subprograms) library. An important feature of our approach is that we model loop transformations using invertible matrices and integer lattice theory, thereby generalizing Banerjee's framework of uni- modular matrices (5).Keywords
This publication has 21 references indexed in Scilit:
- A loop transformation theory and an algorithm to maximize parallelismIEEE Transactions on Parallel and Distributed Systems, 1991
- Compiling communication-efficient programs for massively parallel machinesIEEE Transactions on Parallel and Distributed Systems, 1991
- Limits on interconnection network performanceIEEE Transactions on Parallel and Distributed Systems, 1991
- Compile-time techniques for data distribution in distributed memory machinesIEEE Transactions on Parallel and Distributed Systems, 1991
- Compiling global name-space parallel loops for distributed executionIEEE Transactions on Parallel and Distributed Systems, 1991
- Data optimization: Allocation of arrays to reduce communication on SIMD machinesJournal of Parallel and Distributed Computing, 1990
- Strategies for cache and local memory management by global program transformationJournal of Parallel and Distributed Computing, 1988
- Automatic translation of FORTRAN programs to vector formACM Transactions on Programming Languages and Systems, 1987
- Advanced compiler optimizations for supercomputersCommunications of the ACM, 1986
- The parallel execution of DO loopsCommunications of the ACM, 1974