Polyhedral-based data reuse optimization for configurable computing

Top Cited Papers

11 February 2013

proceedings article
Published by Association for Computing Machinery (ACM)

p. 29-38
https://doi.org/10.1145/2435264.2435273

Abstract

Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching. We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.

Keywords

This publication has 26 references indexed in Scilit:

Optimizing remote accesses for offloaded kernels
ACM SIGPLAN Notices, 2012
Optimizing SDRAM bandwidth for custom FPGA loop accelerators
Published by Association for Computing Machinery (ACM) ,2012
Trade-offs in loop transformations
ACM Transactions on Design Automation of Electronic Systems, 2009
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes
Published by Springer Nature ,2009
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems, 2007
DRDU
ACM Transactions on Design Automation of Electronic Systems, 2007
Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies
International Journal of Parallel Programming, 2006
Lattice-Based Memory Allocation
IEEE Transactions on Computers, 2005
Local memory exploration and optimization in embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1999
Some efficient solutions to the affine scheduling problem. I. One-dimensional time
International Journal of Parallel Programming, 1992