Académique Documents
Professionnel Documents
Culture Documents
School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)
291 Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
E-mail: imdjdj@kaist.ac.kr
Abstract: An energy-efficient deep learning processor is proposed for convolutional neural networks
(CNNs) and recurrent neural networks (RNNs) in mobile platforms. The 16mm2 chip is fabricated using
65nm technology with 3 key features, 1) Reconfigurable heterogeneous architecture to support both CNNs
and RNNs, 2) LUT-based reconfigurable multiplier optimized for dynamic fixed-point with the on-line
adaptation, 3) Quantization table-based matrix multiplication to reduce off-chip memory access and remove
duplicated multiplications. As a result, compared to the [2] and [3], this work shows 20x and 4.5x higher
energy efficiency, respectively. Also, DNPU shows 6.5 higher energy efficiency compared to the [5].
(Keywords: deep learning, convolutional neural network, recurrent neural network, heterogeneous, LUT)
I. Introduction
Deep learning is being researched and used more and more widely because of its overwhelming
performance. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are the key
networks of deep learning, and both have different strengths. CNNs have advantages to extract visual
feature, and RNNs are capable of processing sequential data. CNNs are used for vision recognition such as
image classification and face recognition, and RNNs are used for language processing such as translation
and speech recognition. Also, by combining CNNs and RNNs, we can realize more complex intelligence
like action recognition and image captioning [1]. However, the computational requirements in CNNs are
quite different from those of RNNs. Convolution layers (CLs) in CNN require a massive amount of
computation with a relatively small number of parameters. On the other hand, fully-connected layers (FCLs)
in CNN and RNN layers (RLs) require a relatively small amount of computation with a huge number of
parameters. Therefore, when FCLs and RLs are accelerated with hardware dedicated for CLs, they suffer
from high memory transaction costs, low PE utilization, and a mismatch of the computational patterns.
Conversely, when CLs are accelerated with FCL- and RL-dedicated hardware, they cannot
exploit reusability and achieve required throughput. There are several works which have
considered acceleration of CLs, such as [2-4], or FCLs and RLs like [5]. However, there has
been no work which can support CLs, FCLs and RLs. Therefore, we present deep learning
processor which can support both CNNs and RNNs with high energy-efficiency for
battery-powered mobile platforms.
IV. Conclusion
A highly reconfigurable deep learning processor is proposed with heterogeneous architecture and
dedicated LUT-based multipliers for CNNs and RNNs. As a result, the processor, implemented using 65nm
technology, achieves 8.1TOPS/W energy efficiency. Also, as shown in Fig. 10, it is successfully
demonstrated with the image captioning and action recognition.
[1] Vinyals, O. et al., Show and Tell: A Neural Image Caption Generator, CVPR, pp. 3156-3164, 2015
[2] Chen, Y. et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural
networks, ISSCC Dig. Tech. Papers, pp. 262-263, 2016
[3] Moons, B. et al., A 0.3-2.6TOPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets,
published at Symp. on VLSI Circuits, 2016
[4] Sim, J. et al., A 1.42TOPS/W Deep Convolutional Neural Network Recognition Processor for Intelligent
IoT Systems, ISSCC Dig. Tech. Papers, pp. 264-265, 2016
[5] Han, S. et al., EIE: Efficient Inference Engine on Compressed Deep Neural Network, published at ISCA,
2016
[6] Shin, D, et al., DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep
Neural Networks, ISSCC Dig. Tech. Papers, pp. 240-241, 2017