Issue #20 – Dynamic Vocabulary in Neural MT

06 Dec18 Issue #20 – Dynamic Vocabulary in Neural MT As has been covered a number of times in this series, Neural MT requires good data for training, and acquiring such data for new languages can be costly and not always feasible. One approach in Neural MT literature for improving translation quality for low-resource language is transfer-learning. A common practice is to reuse the model parameters (encoder, decoder, and word embeddings) of a high resource language and fine tune it […]

Read more

Issue #19 – Adaptive Neural MT

29 Nov18 Issue #19 – Adaptive Neural MT Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic Neural Machine Translation is known to be particularly poor at translating out-of-domain data. That is, an engine trained on generic data will be much worse at translating medical documents than an engine trained on medical data. It is much more sensitive to such differences than, say, Statistical MT. This problem is partially solved by domain adaptation techniques, which we covered in Issue #9 […]

Read more

Issue #17 – Speeding up Neural MT

15 Nov18 Issue #17 – Speeding up Neural MT Author: Raj Nath Patel, Machine Translation Scientist @ Iconic For all the benefits Neural MT has brought in terms of translation quality, producing output quickly and efficiently is still a challenge for developers. All things being equal, Neural MT is slower than its statistical counterpart. This is particularly the case when running translation on standard processors (CPUs) as opposed to faster, more powerful (but also more expensive) graphics processors (GPUs), which is […]

Read more

Issue #16 – Revisiting synthetic training data for Neural MT

08 Nov18 Issue #16 – Revisiting synthetic training data for Neural MT Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic In a previous guest post in this series, Prof. Andy Way explained how to create training data for Neural MT through back-translation. This technique involves translating monolingual data in the target language into the source language to obtain a parallel corpus of “synthetic” source and “authentic” target data – so called back-translation. Andy reported interesting findings whereby, with a few million […]

Read more

Issue #15 – Document-Level Neural MT

01 Nov18 Issue #15 – Document-Level Neural MT Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic In this week’s post, we take a look at document-level neural machine translation. Most, if not all existing approaches to machine translation operate on the sentence level. That is to say, when translating a document, it is actually split up into individual sentences or segments, and they are processed independently of each other. With document-level Neural MT, as the name suggests, we are going beyond […]

Read more

Issue #14 – Neural MT: A Case Study

25 Oct18 Issue #14 – Neural MT: A Case Study Author: Dr. John Tinsley, CEO @ Iconic As a machine translation provider, one of the questions we’ve been asking ourselves most frequently over the last 18 months has been, “When should we switch an existing production deployment to Neural MT?”. While all new projects are built using Neural MT, there is a certain element of – “if it ain’t broke, don’t fix it”  – that can creep in when it comes to […]

Read more

Issue #13 – Evaluation of Neural MT Architectures

11 Oct18 Issue #13 – Evaluation of Neural MT Architectures Author: Raj Nath Patel, Machine Translation Scientist @ Iconic What are the different approaches to Neural MT? Since its relatively recent advent, the underlying technology has been based on one of three main architectures: Recurrent Neural Networks (RNN) Convolutional Neural Networks (CNN) Self-Attention Networks (Transformer) For various language pairs, non-recurrent architectures (CNN and Transformer) have outperformed RNNs but there has not been any solid explanations as to why. In this post, we’ll evaluate […]

Read more

Issue #12 – Character-based Neural MT

04 Oct18 Issue #12 – Character-based Neural MT Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic Most flavours of Machine Translation naturally use the word as the basis for learning models. Early work on Neural MT that followed this approach had to limit the vocabulary scope for practical reasons.. This created problems when dealing with out-of-vocabulary words. One approach that was explored to solve this problem was character-based Neural MT. With the emergence of subword approaches, which almost solves the […]

Read more

Issue #11 – Unsupervised Neural MT

27 Sep18 Issue #11 – Unsupervised Neural MT Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic In this week’s article, we will explore unsupervised machine translation. In other words, training a machine translation engine without using any parallel data! As you might imagine, the potential implications of not needing any data to train a Neural MT engine could be huge. In general, most of the approaches in this direction still use some bilingual signal, for example using parallel data […]

Read more

Issue #10 – Evaluating Neural MT post-editing

20 Sep18 Issue #10 – Evaluating Neural MT post-editing Author: Dr. Joss Moorkens, Assistant Professor, Dublin City University This week, we have a guest post from Prof. Joss Moorkens of Dublin City University. Joss is renowned for his work in the area of translation technology and, particularly, the evaluation of MT output for certain use cases. Building on the “human parity” topic from Issue #8 of this series, Joss describes his recent work on evaluation of Neural MT post-editing for […]

Read more
1 907 908 909 910