Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

Join us in building benchmarks that capture early-stage reasoning & scientific knowledge in LLMs! The development of Large Language Models (LLMs) typically begins with a series of ablation experiments, wherein various model architectures, data mixtures, and training hyperparameters are systematically evaluated. This phase is commonly referred to as the early stages of training. During this period, researchers primarily monitor two key metrics: the training loss curve and evaluation scores. However, existing evaluation benchmarks often fail to provide meaningful or discriminative […]

Read more

Efficient MultiModal Data Pipeline

You’ve got everything ready – data, model, a beefy GPU setup. You hit “run” and… wait. And wait some more. Your GPUs are barely breaking a sweat while your wallet’s getting lighter by the hour. Sound familiar? We’ve been there. After some detective work on our nanoVLM project, we discovered the real culprit wasn’t our model or hardware, it was our data pipeline being incredibly wasteful. Here’s what we found: Idle GPUs: Our model was literally waiting around for data […]

Read more

SmolLM3: smol, multilingual, long-context reasoner

Small language models are becoming increasingly important as users seek capable models that can be deployed efficiently. The community has produced a fascinating range of capable small models, each pushing the boundaries of what’s possible at this scale. With SmolLM3, we’re excited to contribute a new competitive fully open 3B model: SmolLM3 sits in the efficiency sweet spot. Our 3B model outperforms Llama-3.2-3B and Qwen2.5-3B while staying competitive with larger 4B alternatives (Qwen3 & Gemma3). Beyond the performance numbers, we’re […]

Read more

Upskill your LLMs with Gradio MCP Servers

Upskill your LLMs With Gradio MCP Servers Have you ever wanted your favorite Large Language Model (LLM) to do more than just answer questions? What if it could edit images for you, browse the web, or organize your email inbox? Well, now it can! In this blog post, I’ll show you: What the MCP protocol is and how it works similarly to    

Read more

Building the Hugging Face MCP Server

TL;DR: The Hugging Face Official MCP Server offers unique customization options for AI Assistants accessing the Hub, along with access to thousands of AI applications through one simple URL. We used MCPs “Streamable HTTP” transport for deployment, and examine in detail the trade-offs that Server Developers have. We’ve learned many things about building a useful MCP server in the last month – we’ll describe our journey here. Introduction The Model Context Protocol (MCP) is    

Read more

Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models

Numina & Kimi Team Figure 1: Performance comparison of theorem proving models on the miniF2F-test dataset. We’re excited to announce the release of Kimina-Prover-72B, our state-of-the-art theorem proving model trained with the Kimi k1.5[1] RL pipeline based on Qwen2.5-72B [2]. Alongside it, we are also releasing two distilled variants: Kimina-Prover-Distill-8B and 1.7B (based on Qwen3-8B and Qwen3-1.7B[3] respectively). Our key innovations include: Test-Time Reinforcement Learning Search: A trainable agentic proving framework that enables the model to recursively discover, combine and […]

Read more

Migrating the Hub from Git LFS to Xet

In January of this year, Hugging Face’s Xet Team deployed a new storage backend, and shortly after shifted ~6% of Hub downloads through the infrastructure. This represented a significant milestone, but it was just the beginning. In 6 months, 500,000 repositories holding 20 PB joined the move to Xet as the Hub outgrows Git LFS and transitions to a storage system that scales with the workloads of AI builders. Today, more than 1 million people on the Hub are using […]

Read more
1 59 60 61 62 63 1,023