An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS
Top Cited Papers
- 28 January 2008
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Journal of Solid-State Circuits
- Vol. 43 (1) , 29-41
- https://doi.org/10.1109/jssc.2007.910957
Abstract
This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.Keywords
This publication has 13 references indexed in Scilit:
- A 256-Kb Dual-${V}_{\rm CC}$ SRAM Building Block in 65-nm CMOS Process With Actively Clamped Sleep TransistorIEEE Journal of Solid-State Circuits, 2006
- A 6.2-GFlops Floating-Point Multiply-Accumulator With Conditional NormalizationIEEE Journal of Solid-State Circuits, 2006
- A Six-Port 57GB/s Double-Pumped Nonblocking Router CorePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 Cu interconnect layers, low-k ILD and 0.57 μm/sub 2/ SRAM cellPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Dynamic sleep transistor and body bias for active leakage power control of microprocessorsIEEE Journal of Solid-State Circuits, 2003
- Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecturePublished by Association for Computing Machinery (ACM) ,2003
- Semi-dynamic and dynamic flip-flops with embedded logicPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The Raw microprocessor: a computational fabric for software circuits and general-purpose programsIEEE Micro, 2002
- A six-port 30-GB/s nonblocking router component using point-to-point simultaneous bidirectional signaling for high-bandwidth interconnectsIEEE Journal of Solid-State Circuits, 2001
- An algorithm for the machine calculation of complex Fourier seriesMathematics of Computation, 1965