The State of Computer Vision at Hugging Face 🤗
At Hugging Face, we pride ourselves on democratizing the field of artificial intelligence together with the community. As a part of that mission, we began focusing our efforts on computer vision over the last year. What started as a PR for having Vision Transformers (ViT) in 🤗 Transformers has now grown into something much bigger – 8 core vision tasks,
Read moreA Dive into Vision-Language Models
Human learning is inherently multi-modal as jointly leveraging multiple senses helps us understand and analyze new information better. Unsurprisingly, recent advances in multi-modal learning take inspiration from the effectiveness of this process to create models that can process and
Read moreIntroducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system
We’re excited to introduce a new tool we created: ⚔️ AI vs. AI ⚔️, a deep reinforcement learning multi-agents competition system. This tool, hosted on Spaces, allows us to create multi-agent competitions.
Read moreGenerating Stories: AI for Game Development #5
Welcome to AI for Game Development! In this series, we’ll be using AI tools to create a fully functional farming game in just 5 days. By the end of this series, you will have learned how you can incorporate a variety of AI tools into your game development workflow. I will show you how you can use AI tools for: Art Style
Read moreSpeech Synthesis, Recognition, and More With SpeechT5
We’re happy to announce that SpeechT5 is now available in 🤗 Transformers, an open-source library that offers easy-to-use implementations of state-of-the-art machine learning models. SpeechT5 was originally described in the paper SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing by Microsoft Research Asia. The official checkpoints published by the paper’s authors are available on the Hugging Face Hub.
Read more🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware
Large Language Models (LLMs) based on the transformer architecture, like GPT, T5, and BERT have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. They have also started foraying into other domains, such as Computer Vision (CV) (VIT,
Read moreWhy we’re switching to Hugging Face Inference Endpoints, and maybe you should too
Hugging Face recently launched Inference Endpoints; which as they put it: solves transformers in production. Inference Endpoints is a managed service that allows you to: Deploy (almost) any model on Hugging Face Hub To any cloud (AWS, and Azure, GCP on the way) On a range of instance types (including GPU) We’re switching some of our Machine Learning (ML) models that
Read moreZero-shot image-to-text generation with BLIP-2
This guide introduces BLIP-2 from Salesforce Research that enables a suite of state-of-the-art visual-language models that are now available in 🤗 Transformers. We’ll show you how to use it for image captioning, prompted image captioning, visual question-answering, and chat-based
Read more