Parallelizing nonnumerical code with selective scheduling and software pipelining
Open Access
- 1 November 1997
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Programming Languages and Systems
- Vol. 19 (6) , 853-898
- https://doi.org/10.1145/267959.269966
Abstract
Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to exploit due to its irregularity. In this article, we introduce a new code-scheduling technique for irregular ILP called “selective scheduling” which can be used as a component for superscalar and VLIW compilers. Selective scheduling can compute a wide set of independent operations across all execution paths based on renaming and forward-substitution and can compute available operations across loop iterations if combined with software pipelining. This scheduling approach has better heuristics for determining the usefulness of moving one operation versus moving another and can successfully find useful code motions without resorting to branch profiling. The compile-time overhead of selective scheduling is low due to its incremental computation technique and its controlled code duplication. We parallelized the SPEC integer benchmarks and five AIX utilities without using branch probabilities. The experiments indicate that a fivefold speedup is achievable on realistic resources with a reasonable overhead in compilation time and code expansion and that a solid speedup increase is also obtainable on machines with fewer resources. These results improve previously known characteristics of irregular ILP.Keywords
This publication has 22 references indexed in Scilit:
- Increasing cache bandwidth using multiport caches for exploiting ILP in non-numerical codeIEE Proceedings - Computers and Digital Techniques, 1997
- Generalized multiway branch unit for VLIW microprocessorsIEEE Transactions on Parallel and Distributed Systems, 1995
- Avoidance and suppression of compensation code in a trace scheduling compilerACM Transactions on Programming Languages and Systems, 1994
- Making compaction-based parallelization affordableIEEE Transactions on Parallel and Distributed Systems, 1993
- The superblock: An effective technique for VLIW and superscalar compilationThe Journal of Supercomputing, 1993
- Compiling for the CydraThe Journal of Supercomputing, 1993
- Instruction-level parallel processing: History, overview, and perspectiveThe Journal of Supercomputing, 1993
- Alpha AXP architectureCommunications of the ACM, 1993
- A development environment for horizontal microcodeIEEE Transactions on Software Engineering, 1988
- Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computingACM SIGMICRO Newsletter, 1981