Exploiting the parallelism available in loops
- 1 February 1994
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Computer
- Vol. 27 (2) , 13-26
- https://doi.org/10.1109/2.261915
Abstract
Because a loop's body often executes many times, loops provide a rich opportunity for exploiting parallelism. This article explains scheduling techniques and compares results on different architectures. Since parallel architectures differ in synchronization overhead, instruction scheduling constraints, memory latencies, and implementation details, determining the best approach for exploiting parallelism can be difficult. To indicate their performance potential, this article surveys several architectures and compilation techniques using a common notation and consistent terminology. First we develop the critical dependence ratio to determine a loop's maximum possible parallelism, given infinite hardware. Then we look at specific architectures and techniques. Loops can provide a large portion of the parallelism available in an application program, since the iterations of a loop may be executed many times. To exploit this parallelism, however, one must look beyond a single basic block or a single iteration for independent operations. The choice of technique depends on the underlying architecture of the parallel machine and the characteristics of each individual loop.Keywords
This publication has 18 references indexed in Scilit:
- Using processor affinity in loop scheduling on shared-memory multiprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Trapezoid self-scheduling: a practical scheduling scheme for parallel compilersIEEE Transactions on Parallel and Distributed Systems, 1993
- Adjustable block size coherent cachesPublished by Association for Computing Machinery (ACM) ,1992
- Optimizing for parallelism and data localityPublished by Association for Computing Machinery (ACM) ,1992
- A data locality optimizing algorithmPublished by Association for Computing Machinery (ACM) ,1991
- Factoring: a practical and robust method for scheduling parallel loopsPublished by Association for Computing Machinery (ACM) ,1991
- Strategies for cache and local memory management by global program transformationJournal of Parallel and Distributed Computing, 1988
- Advanced compiler optimizations for supercomputersCommunications of the ACM, 1986
- Allocating Independent Subtasks on Parallel ProcessorsIEEE Transactions on Software Engineering, 1985
- Trace Scheduling: A Technique for Global Microcode CompactionIEEE Transactions on Computers, 1981