We just gave sight to smolagents
You hypocrite, first take the log out of your own eye, and then you will see clearly to take the speck out of your brother’s eye. Matthew 7, 3-5
Read moreDeep Learning, NLP, NMT, AI, ML
You hypocrite, first take the log out of your own eye, and then you will see clearly to take the speck out of your brother’s eye. Matthew 7, 3-5
Read moreOpenAI’s Sora demo marked a striking advance in AI-generated video last year and gave us a glimpse of the potential capabilities of video generation models. The impact was immediate and since that demo, the video generation space has become increasingly competitive with major players and startups producing their own highly capable models such as Google’s Veo2, Haliluo’s Minimax, Runway’s Gen3 Alpha, Kling, Pika, and Luma Lab’s Dream Machine. Open-source has also had its own surge of video generation models with […]
Read moreIf you’ve ever struggled with a tough math problem, you know how useful it is to think a little longer and work through it carefully. OpenAI’s o1 model showed that when LLMs are trained to do the same—by using more compute during inference—they get significantly better at solving reasoning tasks like mathematics, coding, and logic. However, the recipe behind OpenAI’s reasoning models has been a well kept secret. That is, until last week, when DeepSeek released their DeepSeek-R1 model and […]
Read moreA running document to showcase how to deploy and fine-tune DeepSeek R1 models with Hugging Face on AWS. What is DeepSeek-R1? If you’ve ever struggled with a tough math problem, you know how useful it is to think a little longer and work through it carefully. OpenAI’s o1 model showed that when LLMs are trained to do the same—by using more compute during inference—they get significantly better at solving reasoning tasks like mathematics, coding, and logic.
Read moreFirst issue 🎉 The AI space is moving so fast it’s hard to believe that a year ago we still struggled to generate people with the correct amount of fingers 😂. The last couple of years have
Read moreThis post was written by Philipp Schmid and orginially posted on philschmid.de code can found here. The release of Deepseek R1 shocked the industry. Why? Well, DeepSeek-R1 is an open model that rivals OpenAI’s o1 in complex reasoning tasks, introduced using Group Relative Policy Optimization (GRPO) and RL-focused multi-stage training approach. They not only released the
Read moreIt’s been two weeks since the release of DeepSeek R1 and just a week since we started the open-r1 project to replicate the missing pieces, namely the training pipeline and the synthetic data. This post summarizes: the progress of Open-R1 to replicate the DeepSeek-R1 pipeline and dataset what we learned about DeepSeek-R1 and discussions around it cool projects the community has built since the release of DeepSeek-R1 It should serve both as an update on the project and as a […]
Read moreLanguage models are becoming increasingly capable and can solve tasks autonomously as agents. There are many exciting use cases, especially at the intersection of reasoning, code, and data. However, proper evaluation benchmarks on real-world problems are lacking and hinder progress in the field. To tackle this challenge, Adyen and Hugging Face built the Data Agent Benchmark for Multi-step Reasoning (DABstep) together. DABstep consists of over 450 data analysis tasks designed to evaluate the capabilities of state-of-the-art LLMs and AI agents. […]
Read moreYesterday, OpenAI released Deep Research, a system that browses the web to summarize content and answer questions based on the summary. The system is impressive and blew our minds when we tried it for the first time. One of the main results in the blog post is a strong improvement of performances on the General AI Assistants benchmark (GAIA), a benchmark we’ve been playing with recently as well, where they successfully reached near 67% correct answers on 1-shot on average, […]
Read moreCurrent status of Arabic LLMs leaderboards The growing availability of LLMs supporting Arabic, both as monolingual and multilingual models, prompted the community to create dedicated Arabic language leaderboards. Previously, Arabic-focused leaderboards were typically confined to narrow benchmarks introduced by specific authors, often as demos for their work. In these cases, the authors would set up leaderboards to demonstrate how models performed on a particular task or dataset. Alternatively, other leaderboards required users to run evaluations on their own
Read more