Parallelizing applications into silicon
- 20 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The next decade of computing will be dominated by embedded systems, information appliances and application-specific computers. In order to build these systems, designers will need high-level compilation and CAD tools that generate architectures that effectively meet the needs of each application. In this paper we present a novel compilation system that allows sequential programs, written in C or FORTRAN, to be compiled directly into custom silicon or reconfigurable architectures. This capability is also interesting because trends in computer architecture are moving towards more reconfigurable hardware-like substrates, such as FPGA based systems. Our system works by successfully combining two resource-efficient computing disciplines: Small Memories and Virtual Wires. For a given application, the compiler first analyzes the memory access patterns of pointers and arrays in the program and constructs a partitioned memory system made up of many small memories. The computation is implemented by active computing elements that are spatially distributed within the memory array. A space-time scheduler assigns instructions to the computing elements in a way that maximizes locality and minimizes physical communication distance. It also generates an efficient static schedule for the interconnect. Finally, specialized hardware for the resulting schedule of memory accesses, wires, and computation is generated as a multi-process state machine in synthesizable Verilog. With this system, implemented as a set of SUIF compiler passes, we have successfully compiled programs into hardware and achieve specialization performance enhancements by up to an order of magnitude versus a single general purpose processor. We also achieve additional parallelization speedups similar to those obtainable using a tightly-interconnected multiprocessor.Keywords
This publication has 13 references indexed in Scilit:
- Memory bank disambiguation using modulo unrolling for Raw machinesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Space-time scheduling of instruction-level parallelism on a raw machinePublished by Association for Computing Machinery (ACM) ,1998
- Scalable processors in the billion-transistor era: IRAMComputer, 1997
- Maximizing multiprocessor performance with the SUIF compilerComputer, 1996
- An integrated compile-time/run-time software distributed shared memory systemPublished by Association for Computing Machinery (ACM) ,1996
- ShastaPublished by Association for Computing Machinery (ACM) ,1996
- Programmable active memories: reconfigurable systems come of ageIEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1996
- Data-parallel C on a reconfigurable logic arrayThe Journal of Supercomputing, 1995
- Building and using a highly parallel programmable logic arrayComputer, 1991
- Micro-optimization of floating-point operationsPublished by Association for Computing Machinery (ACM) ,1989