A compiler optimization algorithm for shared-memory multiprocessors
- 1 January 1998
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 9 (8) , 769-787
- https://doi.org/10.1109/71.706049
Abstract
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines.Keywords
This publication has 32 references indexed in Scilit:
- On the automatic parallelization of the Perfect Benchmarks(R)IEEE Transactions on Parallel and Distributed Systems, 1998
- A new algorithm for global optimization for parallelism and localityPublished by Springer Nature ,1995
- Analysis and transformation in an interactive parallel programming toolConcurrency: Practice and Experience, 1993
- Restructuring Fortran programs for CedarConcurrency: Practice and Experience, 1993
- The ParaScope parallel programming environmentProceedings of the IEEE, 1993
- Algorithm 711: BTN: software for parallel unconstrained optimizationACM Transactions on Mathematical Software, 1992
- A loop transformation theory and an algorithm to maximize parallelismIEEE Transactions on Parallel and Distributed Systems, 1991
- An implementation of interprocedural bounded regular section analysisIEEE Transactions on Parallel and Distributed Systems, 1991
- Automatic translation of FORTRAN programs to vector formACM Transactions on Programming Languages and Systems, 1987
- Interprocedural dependence analysis and parallelizationACM SIGPLAN Notices, 1986