A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Top Cited Papers

1 June 2014

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 21607508,p. 696-701
https://doi.org/10.1109/cvprw.2014.106

Abstract

Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.

Keywords

This publication has 13 references indexed in Scilit:

DianNao
Published by Association for Computing Machinery (ACM) ,2014
GKLEE
Published by Association for Computing Machinery (ACM) ,2012
NeuFlow: A runtime reconfigurable dataflow processor for vision
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Acoustic Modeling Using Deep Belief Networks
IEEE Transactions on Audio, Speech, and Language Processing, 2011
Accelerating GPU Kernels for Dense Linear Algebra
Published by Springer Nature ,2011
A 32$\,\times\,$32 Pixel Convolution Processor Chip for Address Event Vision Sensors With 155 ns Event Latency and 20 Meps Throughput
IEEE Transactions on Circuits and Systems I: Regular Papers, 2010
A dynamically configurable coprocessor for convolutional neural networks
Published by Association for Computing Machinery (ACM) ,2010
A Massively Parallel Coprocessor for Convolutional Neural Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Robust Object Recognition with Cortex-Like Mechanisms
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007
Finite precision error analysis of neural network hardware implementations
IEEE Transactions on Computers, 1993