The Open Arabic LLM Leaderboard 2

Current status of Arabic LLMs leaderboards The growing availability of LLMs supporting Arabic, both as monolingual and multilingual models, prompted the community to create dedicated Arabic language leaderboards. Previously, Arabic-focused leaderboards were typically confined to narrow benchmarks introduced by specific authors, often as demos for their work. In these cases, the authors would set up leaderboards to demonstrate how models performed on a particular task or dataset. Alternatively, other leaderboards required users to run evaluations on their own    

Read more

Open R1: Update #2

We are now two weeks into the Open R1 project which aims to reconstruct the missing pieces of DeepSeek R1—specifically, the training pipeline and synthetic data. In this post, we are happy to share the construction of OpenR1-Math-220k: our first large-scale dataset for mathematical reasoning! We also take a look at some exciting developments from the community towards curating small, high-quality datasets for fine-tuning, along with insights into how to control the length of the chain-of-thought from reasoning models at […]

Read more

Build awesome datasets for video generation

(This post was authored by hlky and Sayak) Tooling for image generation datasets is well established, with img2dataset being a fundamental tool used for large scale dataset preparation, and complemented with various community guides, scripts and UIs that cover smaller scale initiatives. Our ambition is to make tooling for video generation datasets equally established, by creating open video    

Read more

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

Content-defined chunking (CDC) plays a central role in enabling deduplication within a Xet-backed repository. The idea is straightforward: break each file’s data into chunks, store only unique ones, reap the benefits. In practice, it’s more complex. If we focused solely on maximizing deduplication, the design would call for the smallest possible chunk size. By doing that, we’d create significant overheads for the infrastructure and the builders on the Hub. On Hugging Face’s Xet team, we’re bringing CDC from theory to […]

Read more

1 Billion Classifications

You’ve optimized your model. Your pipeline is running smoothly. But now, your cloud bill has skyrocketed. Running 1B+ classifications or embeddings per day isn’t just a technical challenge—it’s a financial one. How do you process at this scale without blowing your budget? Whether you’re running large-scale document classification or bulk embedding pipelines for Retrieval-Augmented Generation (RAG), you need cost-efficient, high-throughput inference to    

Read more

Fixing Open LLM Leaderboard with Math-Verify

3 weeks ago, we showed how hard it is to correctly evaluate LLM performance on math problems, and introduced Math-Verify, a better solution to validate models on math (read more in the announcement)! Today, we’re thrilled to share that we’ve used Math-Verify to thoroughly re-evaluate all 3,751 models ever submitted to the Open LLM Leaderboard, for even fairer and more robust model comparisons! Why math evaluation on the Open LLM Leaderboard was broken The    

Read more

PaliGemma 2 Mix – New Instruction Vision Language Models by Google

Last December, Google released PaliGemma 2: a new family of pre-trained (pt) PaliGemma vision language models (VLMs) based on SigLIP and Gemma 2. The models come in three different sizes (3B, 10B, 28B) and three different resolutions (224×224, 448×448, 896×896). Today, Google is releasing PaliGemma 2 mix: fine-tuned on a mix of vision language tasks, including OCR, long and short captioning and more. PaliGemma 2 pretrained (pt) variants are great vision language models to transfer on a given task at […]

Read more

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2 represents a fundamental shift in how we think about video understanding – moving from massive models that require substantial computing resources to efficient models that can run anywhere. Our goal is simple: make video understanding accessible across all devices and use cases, from phones to servers. We are releasing models in three sizes (2.2B, 500M and 256M), MLX ready (Python and Swift APIs) from day zero. We’ve made all models and demos available in this collection. Want to try […]

Read more

SigLIP 2: A better multilingual vision language encoder

Today Google releases a new and better family of multilingual vision-language encoders, SigLIP 2. The authors have extended the training objective of SigLIP (sigmoid loss) with additional objectives for improved semantic understanding, localization, and dense features. SigLIP 2 models outperform the older SigLIP ones at all model scales in core capabilities, including zero-shot classification, image-text retrieval, and transfer performance when extracting visual representations for Vision-Language Models (VLMs). A cherry on top is the dynamic resolution (naflex) variant. This is useful […]

Read more
1 50 51 52 53 54 1,021