Tiny Agents in Python: an MCP-powered agent in ~70 lines of code

NEW: tiny-agents now supports AGENTS.md standard. 🥳 Inspired by Tiny Agents in JS, we ported the idea to Python 🐍 and extended the huggingface_hub client SDK to act as a MCP Client so it can pull tools from MCP servers and pass them to the LLM during inference. MCP (Model Context Protocol) is an open protocol that standardizes how Large Language Models (LLMs) interact with external tools and APIs. Essentially, it removed the need to write custom integrations for each […]

Read more

🐯 Liger GRPO meets TRL

Thank you for your great work. Anyway, I tested the liger loss with deepspeed zero3 using Qwen/Qwen2.5-0.5B-Instruct in a bf16.I met an shape mismatch as stated below: [rank0]: Traceback (most recent call last): [rank0]: File “/workspace/temp.py”, line 22, in [rank0]: trainer.train() [rank0]: File “/usr/local/lib/python3.11/dist-packages/transformers/trainer.py”, line 2238, in train [rank0]: return inner_training_loop( [rank0]: ^^^^^^^^^^^^^^^^^^^^ [rank0]: File “/usr/local/lib/python3.11/dist-packages/transformers/trainer.py”, line 2553, in _inner_training_loop [rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File “/usr/local/lib/python3.11/dist-packages/transformers/trainer.py”, line 3730, in training_step [rank0]: loss = self.compute_loss(model, inputs, […]

Read more

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

TRL supports training LLMs using GRPO, an online learning algorithm recently introduced in the DeepSeekMath paper. In GRPO, the model learns from its own outputs: it generates responses during training, receives feedback, and uses that feedback to improve itself over time. This makes generation a critical step in the training loop — and also a major bottleneck. To speed up generation, TRL integrates with vLLM. This combination lets you train powerful models more efficiently in GRPO setup. However, there’s a […]

Read more

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Today, we introduce SmolVLA, a compact (450M), open-source Vision-Language-Action model for robotics that runs on consumer hardware. Pretrained only on compatibly licensed, open-source community-shared datasets under the lerobot tag. SmolVLA-450M outperforms much larger VLAs and strong baselines such as ACT on simulation (LIBERO, Meta-World) and real-world tasks (SO100, SO101). Supports asynchronous inference for 30% faster response and 2× task throughput. Useful links: 📚 Table of Contents

Read more

KV Cache from scratch in nanoVLM

We have implemented KV Caching from scratch in our nanoVLM repository (a small codebase to train your own Vision Language Model with pure PyTorch). This gave us a 38% speedup in generation. In this blog post we cover KV Caching and all our experiences while implementing it. The lessons learnt are general and can be applied to all autoregressive language model generations. Implementing from scratch on a small codebase is a great learning experience, come along for the ride!

Read more

Introducing Training Cluster as a Service – a new collaboration with NVIDIA

Today at GTC Paris, we are excited to announce Training Cluster as a Service in collaboration with NVIDIA, to make large GPU clusters more easily accessible for research organizations all over the world, so they can train the foundational models of tomorrow in every domain. Making GPU Clusters Accessible Many Gigawatt-size GPU supercluster projects are being built to train the next gen of AI models. This can make it seem that the compute gap between the “GPU    

Read more
1 56 57 58 59 60 1,022