Exploiting the parallelism available in loops

1 February 1994

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in Computer

Vol. 27 (2) , 13-26
https://doi.org/10.1109/2.261915

Abstract

Because a loop's body often executes many times, loops provide a rich opportunity for exploiting parallelism. This article explains scheduling techniques and compares results on different architectures. Since parallel architectures differ in synchronization overhead, instruction scheduling constraints, memory latencies, and implementation details, determining the best approach for exploiting parallelism can be difficult. To indicate their performance potential, this article surveys several architectures and compilation techniques using a common notation and consistent terminology. First we develop the critical dependence ratio to determine a loop's maximum possible parallelism, given infinite hardware. Then we look at specific architectures and techniques. Loops can provide a large portion of the parallelism available in an application program, since the iterations of a loop may be executed many times. To exploit this parallelism, however, one must look beyond a single basic block or a single iteration for independent operations. The choice of technique depends on the underlying architecture of the parallel machine and the characteristics of each individual loop.

Keywords

This publication has 18 references indexed in Scilit:

Using processor affinity in loop scheduling on shared-memory multiprocessors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers
IEEE Transactions on Parallel and Distributed Systems, 1993
Adjustable block size coherent caches
Published by Association for Computing Machinery (ACM) ,1992
Optimizing for parallelism and data locality
Published by Association for Computing Machinery (ACM) ,1992
A data locality optimizing algorithm
Published by Association for Computing Machinery (ACM) ,1991
Factoring: a practical and robust method for scheduling parallel loops
Published by Association for Computing Machinery (ACM) ,1991
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing, 1988
Advanced compiler optimizations for supercomputers
Communications of the ACM, 1986
Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering, 1985
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers, 1981