Combined selection of tile sizes and unroll factors using iterative compilation
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1615 (1089795X) , 237-246
- https://doi.org/10.1109/pact.2000.888348
Abstract
Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. In this paper, we address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of architectures. Finally, we show how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached.Keywords
This publication has 10 references indexed in Scilit:
- Combining optimization for cache and instruction-level parallelismPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Calpa: a tool for automating selective dynamic compilationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Combined selection of tile sizes and unroll factors using iterative compilationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A tile selection algorithm for data locality and cache interferencePublished by Association for Computing Machinery (ACM) ,1999
- A Comparison of Compiler Tiling AlgorithmsPublished by Springer Nature ,1999
- Automatically Tuned Linear Algebra SoftwarePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1998
- Combining Loop Transformations Considering Caches and SchedulingInternational Journal of Parallel Programming, 1998
- Optimizing matrix multiply using PHiPACPublished by Association for Computing Machinery (ACM) ,1997
- Tile size selection using cache organization and data layoutPublished by Association for Computing Machinery (ACM) ,1995
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991