A global communication optimization technique based on data-flow analysis and linear algebra
Open Access
- 1 November 1999
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Programming Languages and Systems
- Vol. 21 (6) , 1251-1297
- https://doi.org/10.1145/330643.330647
Abstract
Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this article, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and it works on structured programs with conditional statements and nested loops but without arbitrary goto statements.The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distribu-tions, and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing, and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (anaverage of 32% reduction), the volume of the data communicated (an average of 37%reduction), and the execution time (an average of 26% reduction).Keywords
This publication has 28 references indexed in Scilit:
- Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis VectorsJournal of Parallel and Distributed Computing, 1996
- Compiling Array Expressions for Efficient Execution on Distributed-Memory MachinesJournal of Parallel and Distributed Computing, 1996
- The Paradigm compiler for distributed-memory multicomputersComputer, 1995
- Generating Local Addresses and Communication Sets for Data-Parallel ProgramsJournal of Parallel and Distributed Computing, 1995
- Compiling Fortran D for MIMD distributed-memory machinesCommunications of the ACM, 1992
- A practical algorithm for exact array dependence analysisCommunications of the ACM, 1992
- Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputersIEEE Transactions on Parallel and Distributed Systems, 1992
- Updating distributed variables in local computationsConcurrency: Practice and Experience, 1990
- Structured dataflow analysis for arrays and its use in an optimizing compilerSoftware: Practice and Experience, 1990
- A program data flow analysis procedureCommunications of the ACM, 1976