Machine Translation Weekly 96: On Evaluation of Non-Autoregressive MT Systems
I often review papers on non-autoregressive machine translation a tend the
repeat the same things in my reviews. The papers often compare non-comparable
things to show the non-autoregressive models in a better light. Apart from the
usual flaws in MT evaluation, non-autoregressive papers often (with honorable
exceptions) get lost in the knowledge distillation setup.
In general, papers tend to motivate non-autoregressive MT by potential speedup.
Although it is an important motivation, it is not the main motivation for me.
By attempting to generate sentences non-autoregressively, we acutally ask the
following question: Can we capture the complete target sentence structure
without explicitly modeling it sequentially? Autoregressive decoding seems
quite counter-intuitive to me: like as if I needed to think about what the next
word/syllable/character is after producing the previous one. Generating