Machine Translation Weekly 51: Machine Translation without Embeddings
Over the few years when neural models are the state of the art in machine
translation, the architectures got quite standardized. There is a vocabulary of
several thousand discrete input/output units. As the first step, the inputs are
represented by static embeddings which get encoded into a contextualized vector
representation. It is used as a sort of working memory by the decoder that
typically has a similar architecture as the decoder that generates the output
left-to-right. In most cases, the input and output vocabularies are the same,
so the same embedding matrix can be used both in the encoder, in the decoder,
and also as an output projection giving a probability distribution over the
output words.
Indeed, they are different underlying architectures (recurrent networks,
convolutional networks, Transformers), people try to come up with conceptual
alternatives such as non-autoregressive models or insertion-based model.
However, there is not much discussion about when