Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques
- 1 March 2000
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 49 (3) , 208-218
- https://doi.org/10.1109/12.841125
Abstract
The speed of arithmetic calculations in configurable hardware is limited by carry propagation, even with the dedicated hardware found in recent FPGAs. This paper proposes and evaluates an approach called delayed addition that reduces the carry-propagation bottleneck and improves the performance of arithmetic calculations. Our approach employs the idea used in Wallace trees to store the results in an intermediate form and delay addition until the end of a repeated calculation such as accumulation or dot-product; this effectively removes carry propagation overhead from the calculation's critical path. We present both integer and floating-point designs that use our technique. Our pipelined integer multiply-accumulate (MAC) design is based on a fairly traditional multiplier design, but with delayed addition as well. This design achieves a 72MHz clock rate on an XC4036xla-9 FPGA and 170MHz clock rate on an XV300epq240-8 FPGA. Next, we present a 32-bit floating-point accumulator based on delayed addition. Here, delayed addition requires a novel alignment technique that decouples the incoming operands from the accumulated result. A conservative version of this design achieves a 40 MHz clock rate on an XC4036xla-9 FPGA and 97MHz clock rate on an XV100epq240-8 FPGA. We also present a 32-bit floating-point accumulator design with compiler-managed overflow avoidance that achieves a 80MHz clock rate on an XC4036xla-9 FPGA and 150MHz clock rate on an XCV100epq240-8 FPGA.Keywords
This publication has 13 references indexed in Scilit:
- A dual floating point coprocessor with an FMAC architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Implementing array multipliers in Xilinx FPGAsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A re-evaluation of the practicality of floating-point operations on FPGAsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Quantitative analysis of floating point arithmetic on FPGA based custom computing machinesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- 167 MHz radix-4 floating point multiplierPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- An 8.8-ns 54×54-bit multiplier with high speed redundant binary architectureIEEE Journal of Solid-State Circuits, 1996
- A 4.4 ns CMOS 54×54-b multiplier using pass-transistor multiplexerIEEE Journal of Solid-State Circuits, 1995
- A 300-MHz 16-b 0.5-μm BiCMOS digital signal processor core LSIIEEE Journal of Solid-State Circuits, 1994
- What every computer scientist should know about floating-point arithmeticACM Computing Surveys, 1991
- A Suggestion for a Fast MultiplierIEEE Transactions on Electronic Computers, 1964