Challenges and Opportunities in NLP Benchmarking

Over the last years, models in NLP have become much more powerful, driven by advances in transfer learning. A consequence of this drastic increase in performance is that existing benchmarks have been left behind. Recent models “have outpaced the benchmarks to test for them” (AI Index Report 2021), quickly reaching super-human performance on standard benchmarks such as SuperGLUE and SQuAD. Does this mean that we have solved natural language processing? Far from it. However, the traditional practices for evaluating performance […]

Read more

DeepSpeed powers 8x larger MoE model training with high performance

Today, we are proud to announce DeepSpeed MoE, a high-performance system that supports massive scale mixture of experts (MoE) models as part of the DeepSpeed optimization library. MoE models are an emerging class of sparsely activated models that have sublinear compute costs with respect to their parameters. For example, the Switch Transformer consists of 1.6 trillion parameters, while the compute required to train it is approximately equal to that of a 10 billion-parameter dense model. This increase in model size […]

Read more

New Future of Work: How remote and hybrid work will shape workplaces and society with Jaime Teevan and Siddharth Suri

Episode 132 | August 12, 2021 For Microsoft researchers, COVID-19 was a call to action. The reimagining of work practices had long been an area of study, but existing and new questions that needed immediate answers surfaced as companies and their employees quickly adjusted to significantly different working conditions. Teams from across the Microsoft  

Read more

Safe program merges at scale: A grand challenge for program repair research

Since the computing world began embracing an open-source approach to programming, building software has become increasingly collaborative. Members of development teams with as few as two developers and as many as thousands are simultaneously editing different components in creating software systems and keeping them functioning optimally, and a three-way merge is the mechanism for integrating changes from these individual contributors. But with so many people independently altering code, it’s unsurprising that updates don’t always synchronize, resulting in bad merges. Bad […]

Read more

Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance

Recently, Transformer-based deep learning models like GPT-3 have been getting a lot of attention in the machine learning world. These models excel at understanding semantic relationships, and they have contributed to large improvements in Microsoft Bing’s search experience and surpassing human performance on the SuperGLUE academic benchmark. However, these models can fail to capture more nuanced relationships between query and document terms beyond pure semantics. In this blog post, we are introducing “Make Every feature Binary” (MEB), a large-scale sparse […]

Read more

New Future of Work: Redefining workspaces as hybrid and remote work become more prevalent with Jaime Teevan and Ginger Hudson

Episode 131 | August 4, 2021 For Microsoft researchers, COVID-19 was a call to action. The reimagining of work practices had long been an area of study, but existing and new questions that needed immediate answers surfaced as companies and their employees quickly adjusted to significantly different working conditions. Teams from across the Microsoft organizational chart pooled their unique  

Read more

New Future of Work: Managing IT and security in remote scenarios with Jaime Teevan and Matt Brodsky

Episode 130 | July 29, 2021 For Microsoft researchers, COVID-19 was a call to action. The reimagining of work practices had long been an area of study, but existing and new questions that needed immediate answers surfaced as companies and their employees quickly adjusted to significantly different working conditions. Teams from across the Microsoft organizational chart pooled their unique  

Read more

On infinitely wide neural networks that exhibit feature learning

In the pursuit of learning about fundamentals of the natural world, scientists have had success with coming at discoveries from both a bottom-up and top-down approach. Neuroscience is a great example of the former. Spanish anatomist Santiago Ramón y Cajal discovered the neuron in the late 19th century. While scientists’ understanding of these building blocks of the brain has grown tremendously in the past century, much about how the brain works on the whole remains an enigma. In contrast, fluid […]

Read more
1 32 33 34 35 36 38