Vision Language Model Alignment in TRL ⚡️

Vision Language Models (VLMs) are getting stronger, but aligning them to human preferences still matters. In TRL, we already showed how to post-train VLMs with Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). This time, we’re going further. tl;dr Here’s what’s new in TRL: Mixed Preference Optimization (MPO) Group Relative Policy Optimization (GRPO) Group Sequence Policy Optimization (GSPO) (a variant of GRPO) These go beyond pairwise DPO, extracting richer signals from preference data and scaling better with modern VLMs. We’ve […]

Read more

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Training large models across multiple GPUs can be challenging due to the complexities of different parallelism strategies. In Accelerate, together with Axolotl, we have integrated a quick and easy way to use any combination of parallelism strategies in your training script! Here is how to add it to your training script: from transformers import AutoModelForCausalLM from accelerate import Accelerator from accelerate.parallelism_config import ParallelismConfig from accelerate.utils import FullyShardedDataParallelPlugin pc = ParallelismConfig( dp_shard_size=2, dp_replicate_size=2, cp_size=2, tp_size=2, ) fsdp_plugin = FullyShardedDataParallelPlugin( fsdp_version=2, auto_wrap_policy=”transformer_based_wrap”, […]

Read more

Introducing AI Sheets: a tool to work with datasets using open AI models!

🧭TL;DR Hugging Face AI Sheets is a new, open-source tool for building, enriching, and transforming datasets using AI models with no code. The tool can be deployed locally or on the Hub. It lets you use thousands of open models from the Hugging Face Hub via Inference Providers or local models, including gpt-oss from OpenAI! Useful links Try the tool for free (no installation required): https://huggingface.co/spaces/aisheets/sheetsInstall and run locally: https://github.com/huggingface/sheets

Read more

🇵🇭 FilBench – Can LLMs Understand and Generate Filipino?

As large language models (LLMs) become increasingly integrated into our lives, it becomes crucial to assess whether they reflect the nuances and capabilities of specific language communities. For example, Filipinos are among the most active ChatGPT users globally, ranking fourth in ChatGPT traffic (behind the United States, India, and Brazil [1] [2]), but despite this strong usage, we lack a clear understanding of how LLMs perform for their languages, such as Tagalog and Cebuano. Most of the existing evidence is […]

Read more

Kimina-Prover-RL

A slimmed-down training pipeline from Kimina Prover, with core features and full compatibility with verl. We are happy to introduce kimina-prover-rl, an open-source training pipeline for formal theorem proving in Lean 4, based on a structured reasoning-then-generation paradigm inspired by DeepSeek-R1. This training pipelinee is a simplified version of the system we used to train Kimina Prover, preserving the key components of the system and offering full compatibility with the open-source Verl framework. It is released as part of a […]

Read more

MCP for Research: How to Connect AI to Research Tools

Academic research involves frequent research discovery: finding papers, code, related models and datasets. This typically means switching between platforms like arXiv, GitHub, and Hugging Face, manually piecing together connections. The Model Context Protocol (MCP) is a standard that allows agentic models to communicate with external tools and data sources. For research discovery, this means AI can    

Read more
1 59 60 61 62 63 1,021