Issue #68 – Incorporating BERT in Neural MT
07 Feb20
Issue #68 – Incorporating BERT in Neural MT
Author: Raj Patel, Machine Translation Scientist @ Iconic
BERT (Bidirectional Encoder Representations from Transformers) has shown impressive results in various Natural Language Processing (NLP) tasks. However, how to effectively apply BERT in Neural MT has not been fully explored. In general, BERT is used as fine-tuning for downstream NLP tasks. For Neural MT, a pre-trained BERT model is used to initialise the encoder in an encoder-decoder architecture. In this post we will discuss an improved technique incorporating BERT in Neural MT aka BERT-fused model proposed by Zhu et. al., (2020).
BERT-fused model
Zhu et. al., (2020) have proposed a modified encoder-decoder architecture in which they first use BERT to extract representation for the input sequence, and then this representation is fused in each layer of the encoder and decoder of the NMT model using cross attention, as depicted in Figure 1. In both Bert-enc, and Bert-dec attention, the Key (K), and Value (V) are created using BERT representation. For
To finish reading, please visit source site