A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
Top Cited Papers
- 1 June 2014
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 21607508,p. 696-701
- https://doi.org/10.1109/cvprw.2014.106
Abstract
Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.Keywords
This publication has 13 references indexed in Scilit:
- DianNaoPublished by Association for Computing Machinery (ACM) ,2014
- GKLEEPublished by Association for Computing Machinery (ACM) ,2012
- NeuFlow: A runtime reconfigurable dataflow processor for visionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Acoustic Modeling Using Deep Belief NetworksIEEE Transactions on Audio, Speech, and Language Processing, 2011
- Accelerating GPU Kernels for Dense Linear AlgebraPublished by Springer Nature ,2011
- A 32$\,\times\,$32 Pixel Convolution Processor Chip for Address Event Vision Sensors With 155 ns Event Latency and 20 Meps ThroughputIEEE Transactions on Circuits and Systems I: Regular Papers, 2010
- A dynamically configurable coprocessor for convolutional neural networksPublished by Association for Computing Machinery (ACM) ,2010
- A Massively Parallel Coprocessor for Convolutional Neural NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Robust Object Recognition with Cortex-Like MechanismsIEEE Transactions on Pattern Analysis and Machine Intelligence, 2007
- Finite precision error analysis of neural network hardware implementationsIEEE Transactions on Computers, 1993