Self-Supervised Annotation of Seismic Images using Latent Space Factorization

Annotating seismic data is expensive, laborious and subjective due to the number of years required for seismic interpreters to attain proficiency in interpretation. In this paper, we develop a framework to automate annotating pixels of a seismic image to delineate geological structural elements given image-level labels assigned to each image… Our framework factorizes the latent space of a deep encoder-decoder network by projecting the latent space to learned sub-spaces. Using constraints in the pixel space, the seismic image is further […]

Read more

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

In this paper, we study the task of selecting optimal response given user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) have shown significant improvements in various natural language processing tasks… This and similar response selection tasks can also be solved using such language models by formulating them as dialog-response binary classification tasks. Although existing works using this approach successfully obtained state-of-the-art results, we observe that language models trained in […]

Read more

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Adversarial training, especially projected gradient descent (PGD), has been the most successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs are meaningful and interpretable by humans… However, the concept of interpretability is not mathematically well established, making it difficult to evaluate it quantitatively. We define interpretability as the alignment of the model gradient with the vector pointing toward the closest point of the support of the other class. We propose […]

Read more

Modern Methods for Text Generation

Synthetic text generation is challenging and has limited success. Recently, a new architecture, called Transformers, allow machine learning models to understand better sequential data, such as translation or summarization… BERT and GPT-2, using Transformers in their cores, have shown a great performance in tasks such as text classification, translation and NLI tasks. In this article, we analyse both algorithms and compare their output quality in text generation tasks. (read more) PDF Abstract Visit source site

Read more

Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

Using logical clauses to represent patterns, Tsetlin machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed on several benchmarks. A team of Tsetlin automata (TAs) composes each clause, thus driving the entire learning process… These are rewarded/penalized according to three local rules that optimize global behaviour. Each clause votes for or against a particular class, with classification resolved using a majority vote. In the parallel and asynchronous architecture that we propose here, […]

Read more

Meta-Learning with Sparse Experience Replay for Lifelong Language Learning

Lifelong learning requires models that can continuously learn from sequential streams of data without suffering catastrophic forgetting due to shifts in data distributions. Deep learning models have thrived in the non-sequential learning paradigm; however, when used to learn a sequence of tasks, they fail to retain past knowledge and learn incrementally… We propose a novel approach to lifelong learning of language tasks based on meta-learning with sparse experience replay that directly optimizes to prevent forgetting. We show that under the […]

Read more

Heterogeneous Domain Generalization via Domain Mixup

One of the main drawbacks of deep Convolutional Neural Networks (DCNN) is that they lack generalization capability. In this work, we focus on the problem of heterogeneous domain generalization which aims to improve the generalization capability across different tasks, which is, how to learn a DCNN model with multiple domain data such that the trained feature extractor can be generalized to supporting recognition of novel categories in a novel target domain… To solve this problem, we propose a novel heterogeneous […]

Read more

Adversarial score matching and improved sampling for image generation

Denoising score matching with Annealed Langevin Sampling (DSM-ALS) is a recent approach to generative modeling. Despite the convincing visual quality of samples, this method appears to perform worse than Generative Adversarial Networks (GANs) under the Fr’echet Inception Distance, a popular metric for generative models… We show that this apparent gap vanishes when denoising the final Langevin samples using the score network. In addition, we propose two improvements to DSM-ALS: 1) Consistent Annealed Sampling as a more stable alternative to Annealed […]

Read more

DeepSpeed: Extreme-scale model training for everyone

In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has enabled researchers to create Turing Natural Language Generation (Turing-NLG), the largest language model with 17 billion parameters and state-of-the-art accuracy at the time of its release. In May, we released ZeRO-2—supporting model training of 200 billion parameters up to 10x […]

Read more

Issue #98 – Unified and Multi-encoders for Context-aware Neural MT

10 Sep20 Issue #98 – Unified and Multi-encoders for Context-aware Neural MT Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction Context-aware Neural MT uses context information to perform document-level translation or domain adaptation. The context of surrounding sentences allows the model to capture discourse phenomena. The context of similar sentences can also be useful to dynamically adapt the translation to a domain. In this post, we take a look at two papers which compare uni-encoder and multi-encoder […]

Read more
1 907 908 909 910 911 916