Training and Finetuning Embedding Models with Sentence Transformers v3

Sentence Transformers is a Python library for using and training embedding models for a wide range of applications, such as retrieval augmented generation, semantic search, semantic textual similarity, paraphrase mining, and more. Its v3.0 update is the largest since the project’s inception, introducing a new training approach. In this blogpost, I’ll show you how to use it to finetune Sentence    

Read more

Benchmarking Text Generation Inference

In this blog we will be exploring Text Generation Inference’s (TGI) little brother, the TGI Benchmarking tool. It will help us understand how to profile TGI beyond simple throughput to better understand the tradeoffs to make decisions on how to tune your deployment for your needs. If you have ever felt like LLM deployments cost too much or    

Read more

Space secrets leak disclosure

Earlier this week our team detected unauthorized access to our Spaces platform, specifically related to Spaces secrets. As a consequence, we have suspicions that a subset of Spaces’ secrets could have been accessed without authorization. As a first step of remediation, we have revoked a number of HF tokens present in those secrets. Users whose tokens have been revoked already received an email    

Read more

Faster assisted generation support for Intel Gaudi

As model sizes grow, Generative AI implementations require significant inference resources. This not only increases the cost per generation, but also increases the power consumption used to serve such requests. Inference optimizations for text generation are essential for reducing latency, infrastructure costs, and power consumption. This can lead to an improved user experience and increased efficiency in text generation tasks. Assisted decoding is a popular method for speeding up text generation. We adapted and optimized it for Intel Gaudi, which […]

Read more

Introducing NPC-Playground, a 3D playground to interact with LLM-powered NPCs

AI-powered NPCs (Non-Playable Characters) are one of the most important breakthroughs brought about by the use of LLMs in games. LLMs, or Large Language Models, make it possible to design “intelligent” in-game characters that can engage in realistic conversations with the player, perform complex actions and follow instructions, dramatically enhancing the player’s experience. AI-powered NPCs represent a huge advancement vs rule-based and heuristics systems. Today, we are excited to introduce NPC-Playground, a demo created by Cubzh and Gigax where you […]

Read more

🧨 Diffusers welcomes Stable Diffusion 3

Stable Diffusion 3 (SD3), Stability AI’s latest iteration of the Stable Diffusion family of models, is now available on the Hugging Face Hub and can be used with 🧨 Diffusers. The model released today is Stable Diffusion 3 Medium, with 2B parameters. As part of this release, we have provided: Models on the Hub Diffusers Integration SD3 Dreambooth and LoRA training scripts Table Of Contents

Read more

A Hugging Face Accelerate Story of Multiple Backends: FSDP and DeepSpeed

There are two popular implementations of the ZeRO Redundancy Optimizer (Zero) algorithm in the community, one from DeepSpeed and the other from PyTorch. Hugging Face Accelerate exposes both these frameworks for the end users to train/tune their models. This blog highlights the differences between how these backends are exposed through Accelerate. To enable users to seamlessly switch between these backends, we upstreamed a precision-related change and a concept guide. Are    

Read more
1 41 42 43 44 45 1,023