Issue #49 – Representation Bottleneck in Neural MT
08 Aug19
Issue #49 – Representation Bottleneck in Neural MT
Author: Raj Patel, Machine Translation Scientist @ Iconic
In Neural MT, lexical features are fed to the network as lexical representations (aka word embeddings) to the first layer of the encoder and refined as propagate through the deep network of hidden layers. In this post we’ll try to understand how the lexical representation is affected as it goes deeper in the network and investigate if it affects the translation quality.
Representation Bottleneck
Recently, several studies have investigated the nature of language features encoded within individual layers of the neural translation model. Belinkov et al. (2018) reported that in recurrent architectures, different layers prioritise different information types. As such, lower layers are suggested to represent morphological and syntactic information, whereas the semantic features are concentrated towards the top of the layer stack. In an ideal scenario, the information encoded in various layers should be transported to the decoder whereas in practice only the last layer is used.
Along the same line, Emelin et al. (2019) studied the transformer architecture. In the transformer model, the information proceeds in a strictly sequential manner, where
To finish reading, please visit source site