Superword-Level Parallelism in the Presence of Control Flow
- 31 March 2005
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 165-175
- https://doi.org/10.1109/cgo.2005.33
Abstract
In this paper, we describe how to extend the concept of superword-level parallelization (SLP), used for multimedia extension architectures, so that it can be applied in the presence of control flow constructs. Superword-level parallelization involves identifying scalar instructions in a large basic block that perform the same operation, and, if dependences do not prevent it, combining them into a superword operation on a multi-word object. A key insight is that we can use techniques related to optimizations for architectures supporting predicated execution, even for multimedia ISAs that do not provide hardware predication. We derive large basic blocks with predicated instructions to which SLP can be applied. We describe how to minimize overheads for superword predicates and re-introduce control flow for scalar operations. We discuss other extensions to SLP to address common features of real multimedia codes. We present automatically-generated performance results on 8 multimedia codes to demonstrate the power of this approach. We observe speedups ranging from 1.97X to 15.07X as compared to both sequential execution and SLP alone.Keywords
This publication has 10 references indexed in Scilit:
- Increasing and detecting memory address congruencePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Compiler-controlled caching in superword register files for multimedia extension architecturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- The architecture of the DIVA processing-in-memory chipPublished by Association for Computing Machinery (ACM) ,2002
- Automatic Intra-Register Vectorization for the Intel® ArchitectureInternational Journal of Parallel Programming, 2002
- Exploiting superword level parallelism with multimedia instruction setsPublished by Association for Computing Machinery (ACM) ,2000
- A Vectorizing Compiler for Multimedia ExtensionsInternational Journal of Parallel Programming, 2000
- Compilation Techniques for Multimedia ProcessorsInternational Journal of Parallel Programming, 2000
- Maximizing multiprocessor performance with the SUIF compilerComputer, 1996
- On linearizing parallel codePublished by Association for Computing Machinery (ACM) ,1985
- Conversion of control dependence to data dependencePublished by Association for Computing Machinery (ACM) ,1983