Tayyar Rzayev, Saber Moradi, David Albonesi, and Rajit Manohar
Deep learning models are computationally expensive and
their performance depends strongly on the underlying hardware
platform. General purpose compute platforms such as
GPUs have been widely used for implementing deep learning
techniques. However, with the advent of emerging application
domains such as internet of things, developments of
custom integrated circuits capable of efficiently implementing
deep learning models with low power and form factor are
in high demand. In this paper we analyze both the computation
and communication costs of common deep networks.
We propose a reconfigurable architecture that efficiently utilizes
computational and storage resources for accelerating
deep learning techniques without loss of algorithmic accuracy.