Issue #110 – Better Out of Vocabulary Translation with Bilingual Terminology Mining

03 Dec20

Issue #110 – Better Out of Vocabulary Translation with Bilingual Terminology Mining

Author: Akshai Ramesh, Machine Translation Scientist @ Iconic

Introduction

A significant weakness in conventional neural machine translation (NMT) systems is their inability to correctly translate Out of Vocabulary (OOV) words: end-to-end NMTs tend to have relatively small vocabularies due to memory limitations with a single “unknown token” (usually abbreviated in MT slang as “unk”) that represents every possible out-of-vocabulary (OOV) word. In NMT, byte-pair encoding can be used to represent OOVs, but they are still often incorrectly translated. In today’s blog post, we take a look at the mining procedure proposed in “Better OOV Translation with Bilingual Terminology Mining” (Huck et al., 2019).

The paper proposes a simple approach for improving

To finish reading, please visit source site

Bilingual Terminology Mining
Out of Vocabulary translation