Automatic data layout for distributed-memory machines
- 1 July 1998
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Programming Languages and Systems
- Vol. 20 (4) , 869-916
- https://doi.org/10.1145/291891.291901
Abstract
The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. After the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. If languages such as HPF are to find general acceptance, the need for data layout selection support has to be addressed. We beleive that the appropriate way to provide the needed support is through a tool that generates data layout specifications automatically. This article discusses the design and implementation of a data layout selection tool that generates HPF-style data layout specifications automatically. Because layout is done in a tool that is not embedded in the target compiler and hence will be run only a few times during the tuning phase of an application, it can use techniques such as integer programming that may be considered too computationally expensive for inclusion in production compilers. The proposed framework for automatic data layout selection builds and examines search spaces of candidate data layouts. A candidate layout is an efficient layout for some part of the program. After the generation of search spaces, a single candidate layout is selected for each program part, resulting in a data layout for the entire program. A good overall data layout may require the remapping of arrays between program parts. A performance estimator based on a compiler model, an execution model, and a machine model are needed to predict the execution time of each candidate layout and the costs of possible remappings between candidate data layouts. In the proposed framework, instances of NP-complete problems are solved during the construction of candidate layout search spaces and the final selection of candidate layouts from each search space. Rather than resorting to heuristics, the framework capitalizes on state-of-the-art 0-1 integer programming technology to compute optimal solutions of these NP-complete problems. A prototype data layout assistant tool based on our framework has been implemented as part of the D system currently under development at Rice University. The article reports preliminary experimental results. The results indicate that the framework is efficient and allows the generation of data layouts of high quality.Keywords
This publication has 32 references indexed in Scilit:
- Optimal and Near–Optimal Solutions for Hard Compilation ProblemsParallel Processing Letters, 1997
- Optimal evaluation of array expressions on massively parallel machinesACM Transactions on Programming Languages and Systems, 1995
- Automatic data allocation to minimize communication on SIMD machinesThe Journal of Supercomputing, 1993
- Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputersIEEE Transactions on Parallel and Distributed Systems, 1992
- The data alignment phase in compiling programs for distributed-memory machinesJournal of Parallel and Distributed Computing, 1991
- Compiling communication-efficient programs for massively parallel machinesIEEE Transactions on Parallel and Distributed Systems, 1991
- Compiling global name-space parallel loops for distributed executionIEEE Transactions on Parallel and Distributed Systems, 1991
- Data optimization: Allocation of arrays to reduce communication on SIMD machinesJournal of Parallel and Distributed Computing, 1990
- Compiling programs for distributed-memory multiprocessorsThe Journal of Supercomputing, 1988
- SUPERB: A tool for semi-automatic MIMD/SIMD parallelizationParallel Computing, 1988