Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries
TL;DR — For those of you who don’t have time to read 5,000 words about async RL plumbing (we get it, you have models to train): The problem: In synchronous RL (reinforcement learning) training, data generation (model inference to create data samples) dominates wall-clock time — a single batch of 32K-token rollouts on a 32B (32-billion parameter) model can take hours, while the GPUs used for training remain idle. The solution everyone converged on: Disaggregate (separate) inference and training onto […]
Read more