Continuous batching

TL;DR: in this blog post, starting from attention mechanisms and KV caching, we derive continuous batching by optimizing for throughput. If you’ve ever used Qwen, Claude, or any other AI chatbot, you’ve probably noticed something: it takes a while for the first word of the response to appear, and then words appear one-by-one on your screen with (hopefully) a regular and fast-paced frequency. That’s because at the heart of it, all LLMs are just fancy next token predictors. An LLM […]

Read more

Welcome FLUX.2 – BFL’s new open image generation model 🤗

FLUX.2 is the recent series of image generation models from Black Forest Labs, preceded by the Flux.1 series. It is an entirely new model with a new architecture and pre-training done from scratch! In this post, we discuss the key changes introduced in FLUX.2, performing inference with it under various setups, and LoRA fine-tuning. 🚨 FLUX.2 is not meant to be a drop-in replacement of FLUX.1, but a new image generation and editing model. Table of contents

Read more

Transformers v5: Simple model definitions powering the AI ecosystem

Transformers’ version v4.0.0rc-1, the initial release candidate for version 4, was released on November 19th, 2020. Five years later, we now release v5.0.0rc-0. Today, as we launch v5, Transformers is installed more than 3 million times each day via pip – up from 20,000/day in v4 🤯. Altogether, it has now surpassed 1.2 billion installs! The ecosystem has expanded from 40 model architectures in v4 to over 400 today, and the community has contributed more than 750,000 model checkpoints on […]

Read more

DeepMath: A lightweight math reasoning Agent with smolagents

By Intel AI Software Group DeepMath is an aligned math reasoning agent built on Qwen3-4B Thinking and fine-tuned with GRPO (Group Relative Policy Optimization). Instead of verbose text, the model emits tiny Python snippets for intermediate steps, runs them in a secure sandbox, and folds the results back into its reasoning, reducing errors and output length. The agent is implemented using the smolagents library. We evaluate DeepMath on four math datasets: MATH500, AIME, HMMT, and HLE, and    

Read more

CUGA on Hugging Face: Democratizing Configurable AI Agents

Introduction AI agents are rapidly becoming essential for building intelligent applications, but creating robust, adaptable agents that scale across domains remains a challenge. Many existing frameworks struggle with brittleness, tool misuse, and failures when faced with complex workflows. CUGA (Configurable Generalist Agent) was designed to overcome these limitations. It’s an open-source, AI Agent that combines flexibility, reliability, and ease of use for enterprise use cases. By abstracting orchestration complexity, CUGA empowers developers to focus on domain requirements rather    

Read more

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

It has become increasingly challenging to assess whether a model’s reported improvements reflect genuine advances or variations in evaluation conditions, dataset composition, or training data that mirrors benchmark tasks. The NVIDIA Nemotron approach to openness addresses this by publishing transparent and reproducible evaluation recipes that make results independently verifiable. NVIDIA released Nemotron 3 Nano 30B A3B with an explicitly open evaluation approach to make that distinction clear. Alongside the model card, we are publishing the complete evaluation recipe used to […]

Read more

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Transformers v5 redesigns how tokenizers work. The big tokenizers reformat separates tokenizer design from trained vocabulary (much like how PyTorch separates neural network architecture from learned weights). The result is tokenizers you can inspect, customize, and train from scratch with far less friction. TL;DR: This blog explains how tokenization works in Transformers and why v5 is a major redesign, with clearer internals, a clean class hierarchy, and a single fast backend. It’s a practical guide for anyone who wants to […]

Read more
1 67 68 69 70 71 1,023