Issue #121 – Finding the Optimal Vocabulary Size for Neural Machine Translation

11 Mar21 Issue #121 – Finding the Optimal Vocabulary Size for Neural Machine Translation Author: Akshai Ramesh, Machine Translation Scientist @ Iconic Introduction Sennrich et al. (2016) introduced a variant of byte pair encoding (BPE) (Gage, 1994) for word segmentation, which is capable of encoding open vocabularies with a compact symbol vocabulary of variable-length subword units. With the use of BPE, the Neural Machine Translation (NMT) systems are capable of open-vocabulary translation by representing rare and unseen words as a […]