Modern Methods for Text Generation

Synthetic text generation is challenging and has limited success. Recently, a new architecture, called Transformers, allow machine learning models to understand better sequential data, such as translation or summarization… BERT and GPT-2, using Transformers in their cores, have shown a great performance in tasks such as text classification, translation and NLI tasks. In this article, we analyse both algorithms and compare their output quality in text generation tasks. (read more) PDF Abstract Visit source site

Read more

Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

Using logical clauses to represent patterns, Tsetlin machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed on several benchmarks. A team of Tsetlin automata (TAs) composes each clause, thus driving the entire learning process… These are rewarded/penalized according to three local rules that optimize global behaviour. Each clause votes for or against a particular class, with classification resolved using a majority vote. In the parallel and asynchronous architecture that we propose here, […]

Read more

Meta-Learning with Sparse Experience Replay for Lifelong Language Learning

Lifelong learning requires models that can continuously learn from sequential streams of data without suffering catastrophic forgetting due to shifts in data distributions. Deep learning models have thrived in the non-sequential learning paradigm; however, when used to learn a sequence of tasks, they fail to retain past knowledge and learn incrementally… We propose a novel approach to lifelong learning of language tasks based on meta-learning with sparse experience replay that directly optimizes to prevent forgetting. We show that under the […]

Read more

Heterogeneous Domain Generalization via Domain Mixup

One of the main drawbacks of deep Convolutional Neural Networks (DCNN) is that they lack generalization capability. In this work, we focus on the problem of heterogeneous domain generalization which aims to improve the generalization capability across different tasks, which is, how to learn a DCNN model with multiple domain data such that the trained feature extractor can be generalized to supporting recognition of novel categories in a novel target domain… To solve this problem, we propose a novel heterogeneous […]

Read more

Adversarial score matching and improved sampling for image generation

Denoising score matching with Annealed Langevin Sampling (DSM-ALS) is a recent approach to generative modeling. Despite the convincing visual quality of samples, this method appears to perform worse than Generative Adversarial Networks (GANs) under the Fr’echet Inception Distance, a popular metric for generative models… We show that this apparent gap vanishes when denoising the final Langevin samples using the score network. In addition, we propose two improvements to DSM-ALS: 1) Consistent Annealed Sampling as a more stable alternative to Annealed […]

Read more

DeepSpeed: Extreme-scale model training for everyone

In February, we announced DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability. DeepSpeed has enabled researchers to create Turing Natural Language Generation (Turing-NLG), the largest language model with 17 billion parameters and state-of-the-art accuracy at the time of its release. In May, we released ZeRO-2—supporting model training of 200 billion parameters up to 10x […]

Read more

Issue #98 – Unified and Multi-encoders for Context-aware Neural MT

10 Sep20 Issue #98 – Unified and Multi-encoders for Context-aware Neural MT Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction Context-aware Neural MT uses context information to perform document-level translation or domain adaptation. The context of surrounding sentences allows the model to capture discourse phenomena. The context of similar sentences can also be useful to dynamically adapt the translation to a domain. In this post, we take a look at two papers which compare uni-encoder and multi-encoder […]

Read more

Expressive Pixels: A new visual communication platform to support creativity, accessibility, and innovation

The need to express oneself is innate for every person in the world, and its roots run through art, technology, communication, and the acts of learning and building things from the ground up. It’s no coincidence, then, that a new platform being released by Microsoft Research, called Expressive Pixels, stems from this belief. Expressive Pixels introduces an authoring app combined with open-source firmware, peripherals, documentation, and APIs that allow users and makers to create animations and then display them on […]

Read more

Platform for Situated Intelligence: An open-source framework for multimodal, integrative AI

Over the years at Microsoft Research, we’ve studied how to build AI systems that perceive, understand, and act in a human-filled world in real time. Our motivation has been to create computing systems that can support interactive experiences akin to what we expect when we talk to or collaborate with people. This research line has involved the development of several physically situated interactive applications, including embodied conversational agents that serve as personal assistants, robots that give directions in our building, […]

Read more

Domain-specific language model pretraining for biomedical natural language processing

COVID-19 highlights a perennial problem facing scientists around the globe: how do we stay up to date with the cutting edge of scientific knowledge? In just a few months since the pandemic emerged, tens of thousands of research papers have been published concerning COVID-19 and the SARS-CoV-2 virus. This explosive growth sparks the creation of the COVID-19 Open Research Dataset (CORD-19) to facilitate research and discovery. However, a pandemic is just one salient example of a prevailing challenge to this […]

Read more
1 903 904 905 906 907 912