Issue #58 – Quantisation of Neural Machine Translation models
31 Oct19
Issue #58 – Quantisation of Neural Machine Translation models
Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic
When large amounts of training data are available, the quality of Neural MT engines increases with the size of the model. However, larger models imply decoding with more parameters, which makes the engine slower at test time. Improving the trade-off between model compactness and translation quality is an active research topic. One of the ways to achieve more compact models is via quantisation, that is, by requiring each parameter value to occupy a fixed number of bits, thus limiting the computational cost. In this post we take a look at a paper which achieves 4 times more compact Transformer Neural MT models via quantisation into 8 bit values, with no loss in translation quality according to BLEU score.
Method
Gabriele Prato et al. (2019) propose to quantise all operations which will provide a computational speed gain at test time. The method consists of using a function which assigns an integer between 0 and 255 (8 bits) to a parameter value, corresponding to where this value stands between the minimum and the maximum values taken by
To finish reading, please visit source site