Research Collection – Reinforcement Learning at Microsoft

Reinforcement learning is about agents taking information from the world and learning a policy for interacting with it, so that they perform better. So, you can imagine a future where, every time you type on the keyboard, the keyboard learns to understand you better. Or every time you interact with some website, it understands better what your preferences are, so the world just starts working better and better at interacting with people. John Langford, Partner Research Manager, MSR NYC Fundamentally, […]

Read more

Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D

Understanding spatial relations (e.g., “laptop on table”) in visual input is important for both humans and robots. Existing datasets are insufficient as they lack large-scale, high-quality 3D ground truth information, which is critical for learning spatial relations… In this paper, we fill this gap by constructing Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. Rel3D enables quantifying the effectiveness of 3D information in predicting spatial relations on large-scale human data. Moreover, we propose minimally contrastive data […]

Read more

Reconstructing cellular automata rules from observations at nonconsecutive times

Recent experiments by Springer and Kenyon have shown that a deep neural network can be trained to predict the action of $t$ steps of Conway’s Game of Life automaton given millions of examples of this action on random initial states. However, training was never completely successful for $t>1$, and even when successful, a reconstruction of the elementary rule ($t=1$) from $t>1$ data is not within the scope of what the neural network can deliver… We describe an alternative network-like method, […]

Read more

Self-Explaining Structures Improve NLP Models

Existing approaches to explaining deep learning models in NLP usually suffer from two major drawbacks: (1) the main model and the explaining model are decoupled: an additional probing or surrogate model is used to interpret an existing model, and thus existing explaining tools are not self-explainable; (2) the probing model is only able to explain a model’s predictions by operating on low-level features by computing saliency scores for individual words but are clumsy at high-level text units such as phrases, […]

Read more

Neural Prototype Trees for Interpretable Fine-grained Image Recognition

Interpretable machine learning addresses the black-box nature of deep neural networks. Visual prototypes have been suggested for intrinsically interpretable image recognition, instead of generating post-hoc explanations that approximate a trained model… However, a large number of prototypes can be overwhelming. To reduce explanation size and improve interpretability, we propose the Neural Prototype Tree (ProtoTree), a deep learning method that includes prototypes in an interpretable decision tree to faithfully visualize the entire model. In addition to global interpretability, a path in […]

Read more

Attributes Aware Face Generation with Generative Adversarial Networks

Recent studies have shown remarkable success in face image generations. However, most of the existing methods only generate face images from random noise, and cannot generate face images according to the specific attributes… In this paper, we focus on the problem of face synthesis from attributes, which aims at generating faces with specific characteristics corresponding to the given attributes. To this end, we propose a novel attributes aware face image generator method with generative adversarial networks called AFGAN. Specifically, we […]

Read more

Bengali Abstractive News Summarization(BANS): A Neural Attention Approach

Abstractive summarization is the process of generating novel sentences based on the information extracted from the original text document while retaining the context. Due to abstractive summarization’s underlying complexities, most of the past research work has been done on the extractive summarization approach… Nevertheless, with the triumph of the sequence-to-sequence (seq2seq) model, abstractive summarization becomes more viable. Although a significant number of notable research has been done in the English language based on abstractive summarization, only a couple of works […]

Read more

DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion

We propose an online multi-view depth prediction approach on posed video streams, where the scene geometry information computed in the previous time steps is propagated to the current time step in an efficient and geometrically plausible way. The backbone of our approach is a real-time capable, lightweight encoder-decoder that relies on cost volumes computed from pairs of images… We extend it by placing a ConvLSTM cell at the bottleneck layer, which compresses an arbitrary amount of past information in its […]

Read more

pixelNeRF: Neural Radiance Fields from One or Few Images

We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. The existing approach for constructing neural radiance fields involves optimizing the representation to every scene independently, requiring many calibrated views and significant compute time… We take a step towards resolving these shortcomings by introducing an architecture that conditions a NeRF on image inputs in a fully convolutional manner. This allows the network to be trained across multiple scenes to learn […]

Read more

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

A wide range of reinforcement learning (RL) problems – including robustness, transfer learning, unsupervised RL, and emergent complexity – require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort… We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a […]

Read more
1 700 701 702 703 704 911