Neural Machine Translation

Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers

The advent of the Transformer can arguably be described as a driving force behind many of the recent advances in natural language processing. However, despite their sizeable performance improvements, as recently shown, the model is severely over-parameterized, being parameter inefficient and computationally expensive to train… Inspired by the success of parameter-sharing in pretrained deep contextualized word representation encoders, we explore parameter-sharing methods in Transformers, with a specific focus on encoder-decoder models for sequence-to-sequence tasks such as neural machine translation. We […]

Read more

Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation

Large pre-trained language models are capable of generating realistic text. However, controlling these models so that the generated text satisfies lexical constraints, i.e., contains specific words, is a challenging problem… Given that state-of-the-art language models are too large to be trained from scratch in a manageable time, it is desirable to control these models without re-training them. Methods capable of doing this are called plug-and-play. Recent plug-and-play methods have been successful in constraining small bidirectional language models as well as […]

Read more

Towards Fully Automated Manga Translation

We tackle the problem of machine translation of manga, Japanese comics. Manga translation involves two important problems in machine translation: context-aware and multimodal translation… Since text and images are mixed up in an unstructured fashion in Manga, obtaining context from the image is essential for manga translation. However, it is still an open problem how to extract context from image and integrate into MT models. In addition, corpus and benchmarks to train and evaluate such model is currently unavailable. In […]

Read more

Learning Light-Weight Translation Models from Deep Transformer

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive… In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with […]

Read more

Machine Translation Weekly 63: Maximum Aposteriori vs. Minimum Bayes Risk decoding

This week I will have a look at the best paper from this year’s COLING that brings an interesting view on inference in NMT models. The title of the paper is “Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation” and its authors are from the University of Amsterdam. NMT models learn the conditional probability of the next word in a target sentence given the source sentence and the previous words in the target […]

Read more

Document-aligned Japanese-English Conversation Parallel Corpus

Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing… As for the second issue, we manually identify the main areas where SL MT […]

Read more

Machine Translation Weekly 62: The EDITOR

Papers about new models for sequence-to-sequence modeling have always been my favorite genre. This week I will talk about a model called EDITOR that was introduced in a pre-print of a paper that will appear in the TACL journal with authors from the University of Maryland. The model is based on the Levenshtein Transformer, a partially non-autoregressive model for sequence-to-sequence learning. Autoregressive models generate the output left-to-right (or right-to-left), conditioning each step on the previously generated token. On the other […]

Read more

Automatic Standardization of Colloquial Persian

The Iranian Persian language has two varieties: standard and colloquial. Most natural language processing tools for Persian assume that the text is in standard form: this assumption is wrong in many real applications especially web content… This paper describes a simple and effective standardization approach based on sequence-to-sequence translation. We design an algorithm for generating artificial parallel colloquial-to-standard data for learning a sequence-to-sequence model. Moreover, we annotate a publicly available evaluation data consisting of 1912 sentences from a diverse set […]

Read more

Globetrotter: Unsupervised Multilingual Translation from Visual Alignment

Multi-language machine translation without parallel corpora is challenging because there is no explicit supervision between languages. Existing unsupervised methods typically rely on topological properties of the language representations… We introduce a framework that instead uses the visual modality to align multiple languages, using images as the bridge between them. We estimate the cross-modal alignment between language and images, and use this estimate to guide the learning of cross-lingual representations. Our language representations are trained jointly in one model with a […]

Read more
1 8 9 10 11 12 14