Personal Copilot: Train Your Own Coding Assistant
In the ever-evolving landscape of programming and software development, the quest for efficiency and productivity has led to remarkable innovations. One such innovation is the emergence of code generation models such as Codex, StarCoder and
Read moreIntroducing Storage Regions on the Hub
As part of our Enterprise Hub plan, we recently released support for Storage Regions. Regions
Read moreComparing the Performance of LLMs: A Deep Dive into Roberta, Llama 2, and Mistral for Disaster Tweets Analysis with Lora
In the fast-moving world of Natural Language Processing (NLP), we often find ourselves comparing different language models to see which one works best for specific tasks. This blog post is all about comparing three models: RoBERTa, Mistral-7b, and Llama-2-7b. We used them to tackle a common problem – classifying tweets about disasters. It is important to note that Mistral and Llama 2
Read moreIntroducing Prodigy-HF
Prodigy is an annotation tool made by Explosion, a company well known
Read moreSDXL in 4 steps with Latent Consistency LoRAs
Latent Consistency Models (LCM) are a way to decrease the number of steps required to generate an image with Stable Diffusion (or SDXL) by distilling the original model into another version that requires fewer steps (4 to 8 instead of the original 25 to 50). Distillation is a type of training procedure that attempts to replicate the outputs from a source model using a new one. The distilled model may be designed to be smaller (that’s the case of DistilBERT […]
Read moreOpen LLM Leaderboard: DROP deep dive
Recently, three new benchmarks were added to the Open LLM Leaderboard: Winogrande, GSM8k and DROP, using the original implementations reproduced in the EleutherAI Harness. A cursory look at the scores for DROP revealed something strange was going on, with the overwhelming majority of models scoring less than 10 out of 100 on their f1-score! We did a deep dive to understand what was going on, come with us to see what we found out! Initial
Read moreGoodbye cold boot – how we made LoRA Inference 300% faster
tl;dr: We swap the Stable Diffusion LoRA adapters per user request, while keeping the base model warm allowing fast LoRA inference across multiple users. You can experience this by browsing our LoRA catalogue and playing with the inference widget. In this blog we will go in detail over how we achieved that. We’ve
Read moreOptimum-NVIDIA on Hugging Face enables blazingly fast LLM inference in just 1 line of code
Large Language Models (LLMs) have revolutionized natural language processing and are increasingly deployed to solve complex problems at scale. Achieving optimal performance with these models is notoriously challenging due to their unique and intense computational demands. Optimized performance of
Read more