Train AI models with Unsloth and Hugging Face Jobs for FREE

This blog post covers how to use Unsloth and Hugging Face Jobs for fast LLM fine-tuning (specifically LiquidAI/LFM2.5-1.2B-Instruct ) through coding agents like Claude Code and Codex. Unsloth provides ~2x faster training and ~60% less VRAM usage compared to standard methods, so training small models can cost just a few dollars. Why a small model? Small language models like LFM2.5-1.2B-Instruct are ideal candidates for fine-tuning. They are cheap to train, fast to iterate on, and increasingly competitive with much larger […]

Read more

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

We are super happy to announce that GGML, creators of Llama.cpp, are joining HF in order to keep future AI open. 🔥 Georgi Gerganov and team are joining HF with the goal of scaling and supporting the community behind ggml and llama.cpp as Local AI continues to make exponential progress in the coming years. We’ve been working with Georgi and team for quite some time (we even have awesome core contributors to llama.cpp like Son and Alek in the team […]

Read more

Mixture of Experts (MoEs) in Transformers

Over the past few years, scaling dense language models has driven most progress in LLMs. From early models like the original ULMFiT (~30M parameters) or GPT-2 (1.5B parameters, which at the time was considered “too dangerous to release” đź§Ś), and eventually to today’s hundred-billion–parameter systems, the recipe was simple: More data + more parameters gives better performance. Scaling laws reinforced this trend, but dense scaling has practical limits: Training becomes increasingly expensive. Inference latency grows. Deployment requires significant memory and […]

Read more

PRX Part 3 — Training a Text-to-Image Model in 24h!

Welcome back đź‘‹ In the last two posts (Part 1 and Part 2), we explored a wide range of architectural and training tricks for diffusion models. We tried to evaluate each idea in isolation, measuring throughput, convergence speed, and final image quality, and tried to understand what actually moves the needle. In this post, we want to answer a much more practical question: What happens when we combine all the tricks that worked? Instead of optimizing one dimension at a […]

Read more

Introducing Modular Diffusers – Composable Building Blocks for Diffusion Pipelines

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can mix and match blocks to create workflows tailored to your needs! This complements the existing DiffusionPipeline class with a more flexible, composable alternative. In this post, we’ll walk through how Modular Diffusers works — from the familiar API to run a modular pipeline, to building fully custom blocks and composing them into your own workflow. We’ll also […]

Read more

Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations

Authors: Enzo Ruedas, Tess Boivin Recent advances in Large Language Models have enabled the transition from text-only reasoning to multimodal systems. First, with the integration of visual perception in Vision–Language Models (VLMs), and more recently with the generation of robot actions in Vision–Language–Action (VLA) models. Deploying these models on embedded robotic platforms remains a challenge    

Read more

LeRobot v0.5.0: Scaling Every Dimension

With over 200 merged PRs and over 50 new contributors since v0.4.0, LeRobot v0.5.0 is our biggest release yet — expanding in every direction at once. More robots (including our first humanoid), more policies (including the comeback of autoregressive VLAs), faster datasets, simulation environments you can load straight from the Hub, and a modernized codebase running on Python 3.12 and Transformers v5. Whether you’re training policies in simulation or deploying them on real hardware, v0.5.0 has something for you. TL;DR […]

Read more

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

TL;DR — For those of you who don’t have time to read 5,000 words about async RL plumbing (we get it, you have models to train): The problem: In synchronous RL (reinforcement learning) training, data generation (model inference to create data samples) dominates wall-clock time — a single batch of 32K-token rollouts on a 32B (32-billion parameter) model can take hours, while the GPUs used for training remain idle. The solution everyone converged on: Disaggregate (separate) inference and training onto […]

Read more
1 68 69 70 71 72 1,021