Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models.

In this blogpost, we present the key highlights and rationales about the Falcon-Edge series – a collection of powerful, universal, and fine-tunable language models available in ternary format, based on the BitNet architecture. Drawing from our experience with BitNet, Falcon-Edge introduces and validates an new pre-training paradigm that delivers a full-scope output from a single training process, simultaneously yielding both non-quantized and quantized model variants. This comprehensive approach produces a non-BitNet model in bfloat16 format, the native BitNet model, and […]

Read more

Microsoft and Hugging Face expand collaboration to make open models easy to use on Azure

Today at the Microsoft Build conference, Satya Nadella announced an expanded collaboration with Hugging Face, to make its wide diversity of open models easy to deploy on Azure secure infrastructure. If you head over to Azure AI Foundry today, you will find a vastly expanded collection of 10,000+ Hugging Face models you can deploy in a couple clicks to power AI applications working with text, audio and images. And we’re just getting started!

Read more

nanoVLM: The simplest repository to train your VLM in pure PyTorch

nanoVLM is the simplest way to get started with training your very own Vision Language Model (VLM) using pure PyTorch. It is lightweight toolkit which allows you to launch a VLM training on a free tier colab notebook. We were inspired by Andrej Karpathy’s nanoGPT, and provide a similar project for the vision domain. At its heart, nanoVLM is a toolkit that helps you build and train a model that can understand both images and text, and then generate text […]

Read more

Exploring Quantization Backends in Diffusers

Large diffusion models like Flux (a flow-based text-to-image generation model) can create stunning images, but their size can be a hurdle, demanding significant memory and compute resources. Quantization offers a powerful solution, shrinking these models to make them more accessible without drastically compromising performance. But the big question always is: can you actually tell the difference in the final image? Before we dive into the technical details of how various quantization backends in Hugging Face Diffusers work, why not test […]

Read more

Falcon-Arabic: A Breakthrough in Arabic Language Models

Check out our official blogpost (EN, AR) We are excited to introduce Falcon-Arabic, a 7B parameter Language Model that sets a new benchmark for Arabic NLP. Built on the Falcon 3 architecture, Falcon-Arabic is a multilingual model that supports Arabic, English, and several other languages. It excels in general knowledge, Arabic grammar, mathematical reasoning, complex problem solving, and understanding the rich diversity of Arabic dialects. Falcon-Arabic supports a context length of 32,000 tokens, allowing it to handle long documents and […]

Read more

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Introduction Check out also our official blogpost Today, we are proud to introduce the Falcon-H1 series, a collection of six open-source models ranging from 0.5B to 34B parameters, each available in both base and instruction-tuned variants. At the core of these models lies a hybrid architecture that combines the strengths of the classical Transformer-based attention mechanism with the State Space Model (SSM), known    

Read more

Tiny Agents in Python: an MCP-powered agent in ~70 lines of code

NEW: tiny-agents now supports AGENTS.md standard. 🥳 Inspired by Tiny Agents in JS, we ported the idea to Python 🐍 and extended the huggingface_hub client SDK to act as a MCP Client so it can pull tools from MCP servers and pass them to the LLM during inference. MCP (Model Context Protocol) is an open protocol that standardizes how Large Language Models (LLMs) interact with external tools and APIs. Essentially, it removed the need to write custom integrations for each […]

Read more

🐯 Liger GRPO meets TRL

Thank you for your great work. Anyway, I tested the liger loss with deepspeed zero3 using Qwen/Qwen2.5-0.5B-Instruct in a bf16.I met an shape mismatch as stated below: [rank0]: Traceback (most recent call last): [rank0]: File “/workspace/temp.py”, line 22, in [rank0]: trainer.train() [rank0]: File “/usr/local/lib/python3.11/dist-packages/transformers/trainer.py”, line 2238, in train [rank0]: return inner_training_loop( [rank0]: ^^^^^^^^^^^^^^^^^^^^ [rank0]: File “/usr/local/lib/python3.11/dist-packages/transformers/trainer.py”, line 2553, in _inner_training_loop [rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File “/usr/local/lib/python3.11/dist-packages/transformers/trainer.py”, line 3730, in training_step [rank0]: loss = self.compute_loss(model, inputs, […]

Read more
1 54 55 56 57 58 1,021