VinVL: Advancing the state of the art for vision-language models

Humans understand the world by perceiving and fusing information from multiple channels, such as images viewed by the eyes, voices heard by the ears, and other forms of sensory input. One of the core aspirations in AI is to develop algorithms that endow computers with a similar ability: to effectively learn from multimodal data like vision-language to make sense of the world around us. For example, vision-language (VL) systems allow searching the relevant images for a text query (or vice […]

Read more

Microsoft DeBERTa surpasses human performance on the SuperGLUE benchmark

Natural language understanding (NLU) is one of the longest running goals in AI, and SuperGLUE is currently among the most challenging benchmarks for evaluating NLU models. The benchmark consists of a wide range of NLU tasks, including question answering, natural language inference, co-reference resolution, word sense disambiguation, and others. Take the causal reasoning task (COPA in Figure 1) as an example. Given the premise “the child became immune to the disease” and the question “what’s the cause for this?,” the […]

Read more

Research at Microsoft 2020: Addressing the present while looking to the future

Microsoft researchers pursue the big questions about what the world will be like in the future and the role technology will play. Not only do they take on the responsibility of exploring the long-term vision of their research, but they must also be ready to react to the immediate needs of the present. This year in particular, they were asked to use their roles as futurists to address pressing societal challenges. In early 2020, as countries began responding to COVID-19 […]

Read more

‘Seeing’ on tiny battery-powered microcontrollers with RNNPool

Computer vision has rapidly evolved over the past decade, allowing for such applications as Seeing AI, a camera app that describes aloud a person’s surroundings, helping those who are blind or have low vision; systems that can detect whether a product, such as a computer chip or article of clothing, has been assembled correctly, improving quality control; and services that can convert information from hard-copy documents into a digital format, making it easier to manage personal and business data. All […]

Read more

MPNet combines strengths of masked and permuted language modeling for language understanding

Pretrained language models have been a hot research topic in natural language processing. These models, such as BERT, are usually pretrained on large-scale language corpora with carefully designed pretraining objectives and then fine-tuned on downstream tasks to boost the accuracy. Among these, masked language modeling (MLM), adopted in BERT, and permuted language modeling (PLM), adopted in XLNet, are two representative pretraining objectives. However, both of them enjoy their own advantages but suffer from limitations. Therefore, researchers from Microsoft Research Asia, […]

Read more

NeurIPS 2020: Moving toward real-world reinforcement learning via batch RL, strategic exploration, and representation learning

As human beings, we encounter unfamiliar situations all the time—learning to drive, living on our own for the first time, starting a new job. And while we can anticipate what to expect based on what others have told us or what we’ve picked up from books and depictions in movies and TV, it isn’t until we’re behind the wheel of a car, maintaining an apartment, or doing a job in a workplace that we’re able to take advantage of one […]

Read more

Research Collection – Reinforcement Learning at Microsoft

Reinforcement learning is about agents taking information from the world and learning a policy for interacting with it, so that they perform better. So, you can imagine a future where, every time you type on the keyboard, the keyboard learns to understand you better. Or every time you interact with some website, it understands better what your preferences are, so the world just starts working better and better at interacting with people. John Langford, Partner Research Manager, MSR NYC Fundamentally, […]

Read more

Utilizing consumer cameras for contact-free physiological measurement in telehealth and beyond

Our research is enabling robust and scalable measurement of physiology. Cameras on everyday devices can be used to detect subtle changes in light reflected from the body caused by physiological processes. Machine learning algorithms are then used to process the camera images and recover the underlying pulse and respiration signals that can then be used for health and wellness tracking. According to the CDC WONDER Online Database, heart disease is currently the leading cause of death for both men and […]

Read more

A Microsoft custom data type for efficient inference

AI is taking on an increasingly important role in many Microsoft products, such as Bing and Office 365. In some cases, it’s being used to power outward-facing features like semantic search in Microsoft Word or intelligent answers in Bing, and deep neural networks (DNNs) are one key to powering these features. One aspect of DNNs is inference—once these networks are trained, they use inference to make judgments about unknown information based on prior learning. In Bing, for example, DNN inference […]

Read more

Adversarial machine learning and instrumental variables for flexible causal modeling

We are going through a new shift in machine learning (ML), where ML models are increasingly being used to automate decision-making in a multitude of domains: what personalized treatment should be administered to a patient, what discount should be offered to an online customer, and other important decisions that can greatly impact people’s lives. The machine learning revolution was primarily driven by problems that are distant from such decision-making scenarios. The first scenarios include predicting what an image depicts, predicting […]

Read more
1 11 12 13 14 15