Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance

Recently, Transformer-based deep learning models like GPT-3 have been getting a lot of attention in the machine learning world. These models excel at understanding semantic relationships, and they have contributed to large improvements in Microsoft Bing’s search experience and surpassing human performance on the SuperGLUE academic benchmark. However, these models can fail to capture more nuanced relationships between query and document terms beyond pure semantics.

In this blog post, we are introducing “Make Every feature Binary” (MEB), a large-scale sparse model that complements our production