Understanding BigBird’s Block Sparse Attention
Transformer-based models have shown to be very useful for many NLP tasks. However, a major limitation of transformers-based models is its O(n2)O(n^2)O(n2) time & memory complexity (where nn
Read moreDeep Learning, NLP, NMT, AI, ML
Transformer-based models have shown to be very useful for many NLP tasks. However, a major limitation of transformers-based models is its O(n2)O(n^2)O(n2) time & memory complexity (where nn
Read moreRun your raw PyTorch training scripts on any kind of device. Most high-level libraries above PyTorch provide support for distributed training and mixed precision, but the abstraction they introduce require a user to learn a new API if they want to customize the underlying training loop. 🤗 Accelerate was created for PyTorch users who like to have full control over their training
Read moreBack in October 2019, my colleague Lysandre Debut published a comprehensive (at the time) inference performance benchmarking blog (1). Since then, 🤗 transformers (2) welcomed a tremendous number of new architectures and thousands of new models were added to the 🤗 hub (3) which now counts more than 9,000 of them as of first quarter of 2021.
Read moreCross-posted from the Gradio blog. The Hugging Face Model Hub has more than 10,000 machine learning models submitted by users. You’ll find all kinds of natural language processing models that, for example, translate between Finnish
Read moreIn many Machine Learning applications, the amount of available labeled data is a barrier to producing a high-performing model. The latest developments in NLP show that you can overcome this limitation by providing a few examples at inference time with a large language model – a technique known as Few-Shot Learning. In this blog post, we’ll explain what Few-Shot Learning is, and
Read moreOver the past few weeks, we’ve built collaborations with many Open Source frameworks in the machine learning ecosystem. One that gets us particularly excited is Sentence Transformers. Sentence Transformers is a framework for sentence, paragraph and image
Read moreWith the additional help of Quentin Lhoest and Sylvain Lesage. Modern language models often require a significant amount of compute for pretraining, making it impossible to obtain them without access to tens and hundreds of GPUs or
Read more