Inference for PROs

Today, we’re introducing Inference for PRO users – a community offering that gives you access to APIs of curated endpoints for some of the most exciting models available, as well as improved rate limits for the usage of free Inference API. Use the following page to subscribe to PRO. Hugging Face PRO users now have access to exclusive API endpoints for a curated list of powerful models that benefit from ultra-fast inference powered by text-generation-inference. This is a benefit on […]

Read more

Llama 2 on Amazon SageMaker a Benchmark

Deploying large language models (LLMs) and other generative AI models can be challenging due to their computational requirements and latency needs. To provide useful recommendations to companies looking to deploy Llama 2 on Amazon SageMaker with the Hugging Face LLM Inference Container, we created a comprehensive benchmark analyzing over 60 different deployment configurations for Llama    

Read more

Finetune Stable Diffusion Models with DDPO via TRL

Diffusion models (e.g., DALL-E 2, Stable Diffusion) are a class of generative models that are widely successful at generating images most notably of the photorealistic kind. However, the images generated by these models may not always be on par with human preference or human intention. Thus arises the alignment problem i.e. how does one go about making sure that the outputs of a model are aligned with human preferences like “quality” or that outputs are aligned with intent that is […]

Read more

Deploying the AI Comic Factory using the Inference API

We recently announced Inference for PROs, our new offering that makes larger models accessible to a broader audience. This opportunity opens up new possibilities for running end-user applications using Hugging Face as a platform. An example of such an application is the AI Comic Factory – a Space that has proved incredibly popular. Thousands of users have tried it to    

Read more

Chat Templates

A spectre is haunting chat models – the spectre of incorrect formatting! tl;dr Chat models have been trained with very different formats for converting conversations into a single tokenizable string. Using a format different from the format a model    

Read more

Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e

Generative AI models, such as Stable Diffusion XL (SDXL), enable the creation of high-quality, realistic content with wide-ranging applications. However, harnessing the power of such models presents significant challenges and computational costs. SDXL is a large image generation model whose UNet component is about three times as large as the one in the previous version of the model. Deploying a model like this in production is challenging due to the increased memory requirements, as well as increased inference times. Today, […]

Read more

Gradio-Lite: Serverless Gradio Running Entirely in Your Browser

Gradio is a popular Python library for creating interactive machine learning apps. Traditionally, Gradio applications have relied on server-side infrastructure to run, which can be a hurdle for developers who need to host their applications. Enter Gradio-lite (@gradio/lite): a library that leverages Pyodide to bring Gradio directly to your browser. In this blog post, we’ll explore what @gradio/lite is, go over example code, and discuss the benefits it offers for running Gradio applications. What is @gradio/lite?    

Read more
1 27 28 29 30 31 1,020