Abstract
Efficient implementation of systolic arrays, and other parallel image processing architectures, has been hindered in the past due to a lack of building blocks or cells. This paper presents two high-speed floating-point DSP coprocessor cells for rapid computation of nonlinear functions. A new result is produced every two clock cycles for 32 bit floating-point arguments and every cycle for 24 bit fixed-point arguments in a pipeline mode. This represents an estimated three-to-four fold improvement over other hardware approaches, and a 10 to 20 fold gain over software approaches. The underlying principle which has made the combined goals of high-speed and multi-functionality possible, is second order interpolation of very small ROM tables together with a new innovation, namely "significance-based computation". A 32 bit floating-point two-cycle chip for computing the square-root, and a 24 bit fixed-point one-cycle chip, both fabricated in 2.0 micron CMOS technology, are presented.

This publication has 6 references indexed in Scilit: