Neural Machine Translation

On Knowledge Distillation for Direct Speech Translation

Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation… In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyze eventual drawbacks of this approach and how to alleviate them maintaining the benefits in terms of translation quality. (read more) PDF

Read more

Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks

The attention mechanism is a key component of the neural revolution in Natural Language Processing (NLP). As the size of attention-based models has been scaling with the available computational resources, a number of pruning techniques have been developed to detect and to exploit sparseness in such models in order to make them more efficient… The majority of such efforts have focused on looking for attention patterns and then hard-coding them to achieve sparseness, or pruning the weights of the attention […]

Read more

Machine Translation Weekly 61: Decoding and diversity

This week I will comment on a short paper from Carnegie Mellon University and Amazon that shows a simple analysis of the diversity of machine translation outputs. The title of the paper is Decoding and Diversity in Machine Translation and it will be presented at the Resistance AI Workshop at NeuRIPS 2020 (what a name for a workshop). The main thing that the paper shows that is the translation quality measured in terms of BLEU score strongly negatively correlates with […]

Read more

MK-SQuIT: Synthesizing Questions using Iterative Template-filling

The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information… Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, […]

Read more

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms… The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed […]

Read more

Incorporating a Local Translation Mechanism into Non-autoregressive Translation

In this work, we introduce a novel local autoregressive translation (LAT) mechanism into non-autoregressive translation (NAT) models so as to capture local dependencies among tar-get outputs. Specifically, for each target decoding position, instead of only one token, we predict a short sequence of tokens in an autoregressive way… We further design an efficient merging algorithm to align and merge the out-put pieces into one final output sequence. We integrate LAT into the conditional masked language model (CMLM; Ghazvininejad et al.,2019) […]

Read more

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling… The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, na”{i}ve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue […]

Read more

Machine Translation of Novels in the Age of Transformer

In this chapter we build a machine translation (MT) system tailored to the literary domain, specifically to novels, based on the state-of-the-art architecture in neural MT (NMT), the Transformer (Vaswani et al., 2017), for the translation direction English-to-Catalan. Subsequently, we assess to what extent such a system can be useful by evaluating its translations, by comparing this MT system against three other systems (two domain-specific systems under the recurrent and phrase-based paradigms and a popular generic on-line system) on three […]

Read more

Learning to Use Future Information in Simultaneous Translation

Simultaneous neural machine translation (briefly, NMT) has attracted much attention recently. In contrast to standard NMT, where the NMT system can access the full input sentence, simultaneous NMT is a prefix-to-prefix problem, where the system can only utilize the prefix of the input sentence and thus more uncertainty and difficulty are introduced to decoding… Wait-k inference is a simple yet effective strategy for simultaneous NMT, where the decoder generates the output sequence $k$ words behind the input words. For wait-k […]

Read more

Machine Translation Weekly 60: Notes about WMT 2020 Shared Tasks

This week, I will follow up the last week’s post and comment on the news from this year’s WMT that was collocated with EMNLP. As every year, there were many shared tasks on various types of translation and evaluation of machine translation. News translation task The news translation task is the oldest task at WMT and sort of a flagship task providing benchmarks for MT research in the long term. Test sets are created by manually translating recent news stories […]

Read more
1 9 10 11 12 13 14