Optimal loop parallelization

1 June 1988

proceedings article
Published by Association for Computing Machinery (ACM)

Vol. 23 (7) , 308-317
https://doi.org/10.1145/53990.54021

Abstract

Parallelizing compilers promise to exploit the parallelism available in a given program, particularly parallelism that is too low-level or irregular to be expressed by hand in an algorithm. However, existing parallelization techniques do not handle loops in a satisfactory manner. Fine-grain (instruction level) parallelization, or compaction, captures irregular parallelism inside a loop body but does not exploit parallelism across loop iterations. Coarser methods, such as doacross [9], sacrifice irregular forms of parallelism in favor of pipelining iterations (software pipelining). Both of these approaches often yield suboptimal speedups even under the best conditions-when resources are plentiful and processors are synchronous. In this paper we present a new technique bridging the gap between fine-and coarse-grain loop parallelization, allowing the exploitation of parallelism inside and across loop iterations. Furthermore, we show that, given a loop and a set of dependencies between its statements, the execution schedule obtained by our transformation is time optimal: no transformation of the loop based on the given data-dependencies can yield a shorter running time for that loop.

Keywords

This publication has 5 references indexed in Scilit:

A development environment for horizontal microcode
IEEE Transactions on Software Engineering, 1988
A compilation technique for software pipelining of loops with conditional jumps
Published by Association for Computing Machinery (ACM) ,1987
Automatic loop interchange
Published by Association for Computing Machinery (ACM) ,1984
Conversion of control dependence to data dependence
Published by Association for Computing Machinery (ACM) ,1983
Dependence graphs and compiler optimizations
Published by Association for Computing Machinery (ACM) ,1981