Deep Learning Systems by Andres Rodriguez

Deep Learning Systems by Andres Rodriguez

Author:Andres Rodriguez [Rodriguez, Andres]
Language: eng
Format: epub
Publisher: Morgan & Claypool Publishers
Published: 2020-02-14T22:00:00+00:00


This algorithm determines the layers that can be quantized. Note that one challenge is that interleaving layers with large and small numerical formats may result in higher computational cost from the overhead of the many conversions.

Cross-layer range equalization is a data-free quantization (requires no data and no back-propagation). The range of weights across the layers is equalized, and the range of activations are constraint under the assumption that a piece-wise linear activation function (such as ReLU) is used between the layers [NvB+19]. This constraint is satisfied by many CNN models but not by non-CNN models. This technique is used in the Qualcomm Neural Processing SDK.

Channel-wise quantization uses a quantization factor for each channel rather than one factor for the entire tensor.

Stochastic rounding (rather than nearest-value rounding) after multiplying by the quantization factor can improve performance [WCB+18]. To illustrate, rather than rounding the number 1.2 to the number 1, it is rounded to 1 with 80% probability and to 2 with 20% probability.

Unsigned int8 ReLU activations uses the unsigned int8 representation, rather than signed int8, for the activations of the ReLU functions. Using signed int8 wastes half of the values since all the activations are nonnegative.

The techniques QAT, selective quantization, channel-wise quantization, and stochastic rounding also benefit fp8 [CBG+20].



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.