View Submission

B0665

Title: Neural networks on the edge: Performance under compression Authors: Alejandro Murua - University of Montreal (Canada) [presenting]
Vahid Partovi Nia - Ecole Polytechnique de Montreal / Huawei Noah's Ark Lab (Canada)
Abstract: With the advent of large high-dimensional data, statistical and machine learning models are becoming more complex, requiring a large number of parameters. Computations in neural networks imply thousands of weight parameters as well as large matrix calculations. The shift from cloud to edge computation has intensified the need to contain the growth of neural network parameters. This has led to the rise of tensorization, a technique that approximates large matrices with small tensors (that is, small multidimensional arrays). Matrix compression with low-rank approximations or tensor decompositions such as the Polyadic, Tucker or tensor-train decompositions have gained popularity. Even though several studies have shown that compression and quantization do not hurt performance with feedforward neural networks nor convolution neural networks, research with recurrent neural networks (RNN) such as gated recurrent units (GRU) or long-short-term-memory (LSTM) networks has not been decisive: keeping a good level of precision in the computations under tensorization is a challenge. The problems arising from tensorization in the RNN training are shown, and some ideas to overcome them. An extensive simulation with real datasets in a language learning task shows that the type of tensorization that appears to be adequate for RNN are based on very short tensor train decompositions.