Issue #121 – Finding the Optimal Vocabulary Size for Neural Machine Translation
11 Mar21
Issue #121 – Finding the Optimal Vocabulary Size for Neural Machine Translation
Author: Akshai Ramesh, Machine Translation Scientist @ Iconic
Introduction
Sennrich et al. (2016) introduced a variant of byte pair encoding (BPE) (Gage, 1994) for word segmentation, which is capable of encoding open vocabularies with a compact symbol vocabulary of variable-length subword units. With the use of BPE, the Neural Machine Translation (NMT) systems are capable of open-vocabulary translation by representing rare and unseen words as a sequence of subword units.
Today, subword tokenisation schemes inspired by BPE have become the norm across many Natural Language Processing tasks. The BPE algorithm has a single hyperparameter – “number of merge operations” – that governs the vocabulary size. According to