Automatic tiling of iterative stencil loops
- 1 November 2004
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Programming Languages and Systems
- Vol. 26 (6) , 975-1028
- https://doi.org/10.1145/1034774.1034777
Abstract
Iterative stencil loops are used in scientific programs to implement relaxation methods for numerical simulation and signal processing. Such loops iteratively modify the same array elements over different time steps, which presents opportunities for the compiler to improve the temporal data locality through loop tiling. This article presents a compiler framework for automatic tiling of iterative stencil loops, with the objective of improving the cache performance. The article first presents a technique which allows loop tiling to satisfy data dependences in spite of the difficulty created by imperfectly nested inner loops. It does so by skewing the inner loops over the time steps and by applying a uniform skew factor to all loops at the same nesting level. Based on a memory cost analysis, the article shows that the skew factor must be minimized at every loop level in order to minimize cache misses. A graph-theoretical algorithm, which takes polynomial time, is presented to determine the minimum skew factor. Furthermore, the memory-cost analysis derives the tile size which minimizes capacity misses. Given the tile size, an efficient and general array-padding scheme is applied to remove conflict misses. Experiments were conducted on 16 test programs and preliminary results showed an average speedup of 1.58 and a maximum speedup of 5.06 across those test programs.Keywords
This publication has 12 references indexed in Scilit:
- Static tiling for heterogeneous computing platformsParallel Computing, 1999
- Augmenting loop tiling with data alignment for improved cache performanceIEEE Transactions on Computers, 1999
- Nonlinear and symbolic data dependence testingIEEE Transactions on Parallel and Distributed Systems, 1998
- Interprocedural analysis for loop scheduling and data allocationParallel Computing, 1998
- Quantifying the Multi-Level Nature of Tiling InteractionsInternational Journal of Parallel Programming, 1998
- Fusion of loops for parallelism and localityIEEE Transactions on Parallel and Distributed Systems, 1997
- Exploiting monotone convergence functions in parallel programsPublished by Springer Nature ,1997
- Software pipeliningACM Computing Surveys, 1995
- A practical algorithm for exact array dependence analysisCommunications of the ACM, 1992
- Automatic translation of FORTRAN programs to vector formACM Transactions on Programming Languages and Systems, 1987