Machine Translation and Multilinguality 04/2022
Another month is over, so here is my overview of what I found most interesting
in machine translation and multilinguality.
Rotation ciphers as regularizers
A paper accepted to ACL 2022 from Simon
Fraser University experiments with using rotation ciphers on the source side of
MT as a data augmentation technique. They tested it in low data scenarios and
it seems to work quite well, which actually seems quite strange to me. It’s
just systematic replacing characters with different characters – it does not
lead to similar subwords on the source and the target side, it does not make
the tokens better alignable, but it still works.
Characters vs. subwords: it depends on the task
A pre-print from Tel Aviv and Bar Ilan
University they compare character-level