Polyhedral-based data reuse optimization for configurable computing
Top Cited Papers
- 11 February 2013
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching. We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.Keywords
This publication has 26 references indexed in Scilit:
- Optimizing remote accesses for offloaded kernelsACM SIGPLAN Notices, 2012
- Optimizing SDRAM bandwidth for custom FPGA loop acceleratorsPublished by Association for Computing Machinery (ACM) ,2012
- Trade-offs in loop transformationsACM Transactions on Design Automation of Electronic Systems, 2009
- Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific CodesPublished by Springer Nature ,2009
- Incremental hierarchical memory size estimation for steering of loop transformationsACM Transactions on Design Automation of Electronic Systems, 2007
- DRDUACM Transactions on Design Automation of Electronic Systems, 2007
- Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory HierarchiesInternational Journal of Parallel Programming, 2006
- Lattice-Based Memory AllocationIEEE Transactions on Computers, 2005
- Local memory exploration and optimization in embedded systemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1999
- Some efficient solutions to the affine scheduling problem. I. One-dimensional timeInternational Journal of Parallel Programming, 1992