Inference for PROs
Today, we’re introducing Inference for PRO users – a community offering that gives you access to APIs of curated endpoints for some of the most exciting models available, as well as improved rate limits for the usage of free Inference API. Use the following page to subscribe to PRO. Hugging Face PRO users now have access to exclusive API endpoints for a curated list of powerful models that benefit from ultra-fast inference powered by text-generation-inference. This is a benefit on […]
Read moreLlama 2 on Amazon SageMaker a Benchmark
Deploying large language models (LLMs) and other generative AI models can be challenging due to their computational requirements and latency needs. To provide useful recommendations to companies looking to deploy Llama 2 on Amazon SageMaker with the Hugging Face LLM Inference Container, we created a comprehensive benchmark analyzing over 60 different deployment configurations for Llama
Read moreNon-engineers guide: Train a LLaMA 2 chatbot
In this tutorial we will show you how anyone can build their own open-source ChatGPT without ever writing a single line of code! We’ll use the LLaMA 2 base model, fine tune it for chat with an open-source instruction
Read moreFinetune Stable Diffusion Models with DDPO via TRL
Diffusion models (e.g., DALL-E 2, Stable Diffusion) are a class of generative models that are widely successful at generating images most notably of the photorealistic kind. However, the images generated by these models may not always be on par with human preference or human intention. Thus arises the alignment problem i.e. how does one go about making sure that the outputs of a model are aligned with human preferences like “quality” or that outputs are aligned with intent that is […]
Read moreDeploying the AI Comic Factory using the Inference API
We recently announced Inference for PROs, our new offering that makes larger models accessible to a broader audience. This opportunity opens up new possibilities for running end-user applications using Hugging Face as a platform. An example of such an application is the AI Comic Factory – a Space that has proved incredibly popular. Thousands of users have tried it to
Read moreChat Templates
A spectre is haunting chat models – the spectre of incorrect formatting! tl;dr Chat models have been trained with very different formats for converting conversations into a single tokenizable string. Using a format different from the format a model
Read moreAccelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e
Generative AI models, such as Stable Diffusion XL (SDXL), enable the creation of high-quality, realistic content with wide-ranging applications. However, harnessing the power of such models presents significant challenges and computational costs. SDXL is a large image generation model whose UNet component is about three times as large as the one in the previous version of the model. Deploying a model like this in production is challenging due to the increased memory requirements, as well as increased inference times. Today, […]
Read more