Trade-offs in loop transformations
- 1 March 2009
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Design Automation of Electronic Systems
- Vol. 14 (2) , 1-30
- https://doi.org/10.1145/1497561.1497565
Abstract
Nowadays, multimedia systems deal with huge amounts of memory accesses and large memory footprints. To alleviate the impact of these accesses and reduce the memory footprint, high-level memory exploration and optimization techniques have been proposed. These techniques try to more efficiently utilize the memory hierarchy. An important step in these optimization techniques are loop transformations (LT). They have a crucial effect on later data memory footprint optimization steps and code generation. However, the state-of-the-art work has focused only on individual objectives. The main one in literature involves improving the locality of data accesses, and thus reducing the data memory footprint. It does not consider the trade-offs in the LT step in relation to successive optimization steps. Therefore, it is not globally efficient in mapping the application on the target platform. In this article we will discuss several trade-offs during the loop transformations. To our knowledge, we are the first ones considering these global trade-offs. Previous work always gave mostly one solution, having the best locality and thus the optimized memory footprint, even though some research in two-dimensional trade-offs in this area exists as well. We start from this state-of-the-art solution with minimal footprint. We show that by sacrificing the footprint, we can obtain gains in data reuse (crucial for energy reduction) and reduce the control-flow complexity. We demonstrate our approach on a real-life application, namely the QSDPCM video coder. At the end, we show that considering trade-offs for this application leads to 16% energy reduction in a two-layer memory subsystem and 10% cycle reduction on the ARM platform.Keywords
This publication has 26 references indexed in Scilit:
- Array recovery and high-level transformations for DSP applicationsACM Transactions on Embedded Computing Systems, 2003
- A layout-conscious iteration space transformation techniqueIEEE Transactions on Computers, 2001
- Energy-aware runtime scheduling for embedded-multiprocessor SOCsIEEE Design & Test of Computers, 2001
- Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric DomainsJournal of Parallel and Distributed Computing, 1995
- Portable video-on-demand in wireless communicationProceedings of the IEEE, 1995
- Automatic program parallelizationProceedings of the IEEE, 1993
- A practical algorithm for exact array dependence analysisCommunications of the ACM, 1992
- Architecture-driven synthesis techniques for VLSI implementation of DSP algorithmsProceedings of the IEEE, 1990
- Compiler optimizations for enhancing parallelism and their impact on architecture designIEEE Transactions on Computers, 1988
- Automatic translation of FORTRAN programs to vector formACM Transactions on Programming Languages and Systems, 1987