Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors

1 September 1995

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems

Vol. 6 (9) , 943-962
https://doi.org/10.1109/71.466632

Abstract

Presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have looked at hyperplane partitioning of iteration spaces to reduce communication traffic, the problem of deriving the optimal tiling parameters for minimal communication in loops with general affine index expressions has remained open. Our paper solves this open problem by presenting a method for deriving an optimal hyperparallelepiped tiling of iteration spaces for minimal communication in multiprocessors with caches. We show that the same theoretical framework can also be used to determine optimal tiling parameters for both data and loop partitioning in distributed memory multicomputers. Our framework uses matrices to represent iteration and data space mappings and the notion of uniformly intersecting references to capture temporal locality in array references. We introduce the notion of data footprints to estimate the communication traffic between processors and use linear algebraic methods and lattice theory to compute precisely the size of data footprints. We have implemented this framework in a compiler for Alewife, a distributed shared-memory multiprocessor.

Keywords

This publication has 17 references indexed in Scilit:

On estimating and enhancing cache effectiveness
Published by Springer Nature ,2006
Hierarchical compilation of macro dataflow graphs for multiprocessors with local memory
IEEE Transactions on Parallel and Distributed Systems, 1994
Memory assignment for multiprocessor caches through grey coloring
Published by Springer Nature ,1994
Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers
IEEE Transactions on Parallel and Distributed Systems, 1992
Lazy task creation: a technique for increasing the granularity of parallel programs
IEEE Transactions on Parallel and Distributed Systems, 1991
Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic
IEEE Transactions on Parallel and Distributed Systems, 1991
The cache performance and optimizations of blocked algorithms
Published by Association for Computing Machinery (ACM) ,1991
M-structures: Extending a parallel, non-strict, functional language with state
Published by Springer Nature ,1991
Compile-time techniques for data distribution in distributed memory machines
IEEE Transactions on Parallel and Distributed Systems, 1991
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing, 1988