The Design and Implementation of the Massively Parallel Processor Based on the Matrix Architecture
- 26 December 2006
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Journal of Solid-State Circuits
- Vol. 42 (1) , 183-192
- https://doi.org/10.1109/jssc.2006.886545
Abstract
This paper describes the design and implementation of the massively parallel processor based on the matrix architecture which is suitable for portable multimedia applications. The proposed architecture in this paper achieves the high performance of 40 GOPS in the case of consecutive fixed-point 16-bit additions at 200MHz clock frequency and the small power dissipation of 250mW. In addition, 1Mbit SRAM for data registers and 2048 2-bit-grained processing elements connected by a flexible switching network are integrated in the small area of 3.1 mm 2 in 90nm CMOS low standby technology. These design techniques and architectures described in this paper are attractive for realizing area-efficient, energy-efficient, and high-performance multimedia processorsKeywords
This publication has 8 references indexed in Scilit:
- A 40GOPS 250mW massively parallel processor based on matrix architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- A 51.2GOPS 1.0GB/s-DMA single-chip multi-processor integrating quadruple 8-way VLIW processorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Creating the bluegene/l supercomputer from low-power SoC ASICsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- The design and implementation of a first-generation CELL processorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A 600MIPS 120mw 70μA leakage triple-CPU mobile application processor chipPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A streaming processing unit for a CELL processorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A 51.2-GOPS scalable video recognition processor for intelligent cruise control based on a linear array of 128 four-way VLIW processing elementsIEEE Journal of Solid-State Circuits, 2003
- The implementation of the Cm* multi-microprocessorPublished by Association for Computing Machinery (ACM) ,1977