Combined selection of tile sizes and unroll factors using iterative compilation

7 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1615 (1089795X) , 237-246
https://doi.org/10.1109/pact.2000.888348

Abstract

Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. In this paper, we address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of architectures. Finally, we show how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached.

Keywords

This publication has 10 references indexed in Scilit:

Combining optimization for cache and instruction-level parallelism
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Calpa: a tool for automating selective dynamic compilation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Combined selection of tile sizes and unroll factors using iterative compilation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A tile selection algorithm for data locality and cache interference
Published by Association for Computing Machinery (ACM) ,1999
A Comparison of Compiler Tiling Algorithms
Published by Springer Nature ,1999
Automatically Tuned Linear Algebra Software
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1998
Combining Loop Transformations Considering Caches and Scheduling
International Journal of Parallel Programming, 1998
Optimizing matrix multiply using PHiPAC
Published by Association for Computing Machinery (ACM) ,1997
Tile size selection using cache organization and data layout
Published by Association for Computing Machinery (ACM) ,1995
The cache performance and optimizations of blocked algorithms
Published by Association for Computing Machinery (ACM) ,1991