Timm ❤️ Transformers: Use any timm model with transformers

Get lightning-fast inference, quick quantization, torch.compile boosts, and effortless fine-tuning for any timm model—all within the friendly 🤗 transformers ecosystem. Enter TimmWrapper—a simple, yet powerful tool that unlocks this potential. In this post, we’ll cover: How the timm integration works and why it’s a game-changer. How to integrate timm models with 🤗 transformers. Practical examples: pipelines, quantization, fine-tuning, and more. To follow along with this blog post, install the latest version of transformers and timm by running: pip install -Uq […]

Read more

State of open video generation models in Diffusers

OpenAI’s Sora demo marked a striking advance in AI-generated video last year and gave us a glimpse of the potential capabilities of video generation models. The impact was immediate and since that demo, the video generation space has become increasingly competitive with major players and startups producing their own highly capable models such as Google’s Veo2, Haliluo’s Minimax, Runway’s Gen3 Alpha, Kling, Pika, and Luma Lab’s Dream Machine. Open-source has also had its own surge of video generation models with […]

Read more

Open-R1: a fully open reproduction of DeepSeek-R1

If you’ve ever struggled with a tough math problem, you know how useful it is to think a little longer and work through it carefully. OpenAI’s o1 model showed that when LLMs are trained to do the same—by using more compute during inference—they get significantly better at solving reasoning tasks like mathematics, coding, and logic. However, the recipe behind OpenAI’s reasoning models has been a well kept secret. That is, until last week, when DeepSeek released their DeepSeek-R1 model and […]

Read more

How to deploy and fine-tune DeepSeek models on AWS

A running document to showcase how to deploy and fine-tune DeepSeek R1 models with Hugging Face on AWS. What is DeepSeek-R1? If you’ve ever struggled with a tough math problem, you know how useful it is to think a little longer and work through it carefully. OpenAI’s o1 model showed that when LLMs are trained to do the same—by using more compute during inference—they get significantly better at solving reasoning tasks like mathematics, coding, and logic.    

Read more

Open-R1: Update #1

It’s been two weeks since the release of DeepSeek R1 and just a week since we started the open-r1 project to replicate the missing pieces, namely the training pipeline and the synthetic data. This post summarizes: the progress of Open-R1 to replicate the DeepSeek-R1 pipeline and dataset what we learned about DeepSeek-R1 and discussions around it cool projects the community has built since the release of DeepSeek-R1 It should serve both as an update on the project and as a […]

Read more
1 48 49 50 51 52 1,020