Issue #110 – Better Out of Vocabulary Translation with Bilingual Terminology Mining
03 Dec20
Issue #110 – Better Out of Vocabulary Translation with Bilingual Terminology Mining
Author: Akshai Ramesh, Machine Translation Scientist @ Iconic
Introduction
A significant weakness in conventional neural machine translation (NMT) systems is their inability to correctly translate Out of Vocabulary (OOV) words: end-to-end NMTs tend to have relatively small vocabularies due to memory limitations with a single “unknown token” (usually abbreviated in MT slang as “unk”) that represents every possible out-of-vocabulary (OOV) word. In NMT, byte-pair encoding can be used to represent OOVs, but they are still often incorrectly translated. In today’s blog post, we take a look at the mining procedure proposed in “Better OOV Translation with Bilingual Terminology Mining” (Huck et al., 2019).
The paper proposes a simple approach for improving