DianNao
Top Cited Papers
- 24 February 2014
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 49 (4) , 269-284
- https://doi.org/10.1145/2541940.2541967
Abstract
Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm(2) and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.Keywords
This publication has 27 references indexed in Scilit:
- Learning deep structured semantic models for web search using clickthrough dataPublished by Association for Computing Machinery (ACM) ,2013
- Neural Acceleration for General-Purpose Approximate ProgramsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- BenchNN: On the broad potential application scope of hardware neural network acceleratorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Accelerating neuromorphic vision algorithms for recognitionPublished by Association for Computing Machinery (ACM) ,2012
- A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nmPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- NeuFlow: A runtime reconfigurable dataflow processor for visionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Dynamically Reconfigurable Silicon Array of Spiking Neurons With Conductance-Based SynapsesIEEE Transactions on Neural Networks, 2007
- An Efficient Hardware Architecture for a Neural Network Activation Function GeneratorPublished by Springer Nature ,2006
- Software assistance for data cachesFuture Generation Computer Systems, 1995
- Finite precision error analysis of neural network hardware implementationsIEEE Transactions on Computers, 1993