Parallelizing applications into silicon

20 January 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 70-80
https://doi.org/10.1109/fpga.1999.803669

Abstract

The next decade of computing will be dominated by embedded systems, information appliances and application-specific computers. In order to build these systems, designers will need high-level compilation and CAD tools that generate architectures that effectively meet the needs of each application. In this paper we present a novel compilation system that allows sequential programs, written in C or FORTRAN, to be compiled directly into custom silicon or reconfigurable architectures. This capability is also interesting because trends in computer architecture are moving towards more reconfigurable hardware-like substrates, such as FPGA based systems. Our system works by successfully combining two resource-efficient computing disciplines: Small Memories and Virtual Wires. For a given application, the compiler first analyzes the memory access patterns of pointers and arrays in the program and constructs a partitioned memory system made up of many small memories. The computation is implemented by active computing elements that are spatially distributed within the memory array. A space-time scheduler assigns instructions to the computing elements in a way that maximizes locality and minimizes physical communication distance. It also generates an efficient static schedule for the interconnect. Finally, specialized hardware for the resulting schedule of memory accesses, wires, and computation is generated as a multi-process state machine in synthesizable Verilog. With this system, implemented as a set of SUIF compiler passes, we have successfully compiled programs into hardware and achieve specialization performance enhancements by up to an order of magnitude versus a single general purpose processor. We also achieve additional parallelization speedups similar to those obtainable using a tightly-interconnected multiprocessor.

Keywords

This publication has 13 references indexed in Scilit:

Memory bank disambiguation using modulo unrolling for Raw machines
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Space-time scheduling of instruction-level parallelism on a raw machine
Published by Association for Computing Machinery (ACM) ,1998
Scalable processors in the billion-transistor era: IRAM
Computer, 1997
Maximizing multiprocessor performance with the SUIF compiler
Computer, 1996
An integrated compile-time/run-time software distributed shared memory system
Published by Association for Computing Machinery (ACM) ,1996
Shasta
Published by Association for Computing Machinery (ACM) ,1996
Programmable active memories: reconfigurable systems come of age
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1996
Data-parallel C on a reconfigurable logic array
The Journal of Supercomputing, 1995
Building and using a highly parallel programmable logic array
Computer, 1991
Micro-optimization of floating-point operations
Published by Association for Computing Machinery (ACM) ,1989