Machine Translation Weekly 43: Dynamic Programming Encoding
One of the narratives people (including me) love to associate with neural
machine translation is that we got rid of all linguistic assumptions about the
text and let the neural network learn their own way independent of what people
think about language. It sounds cool, it almost gives a science-fiction
feeling. What I think that we really do is that we move our assumptions about
language from hard constrains of discrete representation into soft constraints
of inductive biases that we impose on our neural architecture.
This is also the case of the paper I am going to discuss today. It throws away
the assumption that input should be tokenized into words that unigram
statistics are a good heuristic for segmentation. This is replaced by fully
learned segmentation that should be tailored specifically to the task of
machine translation. The title of the paper is Dynamic Programming Encoding
for Subword