Finally, a Replacement for BERT

This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board, with a 8192 sequence length, better downstream performance and much faster processing. ModernBERT is available as a slot-in replacement for any BERT-like models, with both a base (149M params) and large (395M params) model size. Click to see how to use these models with transformers ModernBERT will be included in v4.48.0 of transformers. Until then, it requires installing transformers from […]

Read more

Visualize and understand GPU memory in PyTorch

You must be familiar with this message 🤬: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.93 GiB total capacity; 6.00 GiB already allocated; 14.88 MiB free; 6.00 GiB reserved in total by PyTorch) While it’s easy to see that GPU memory is full, understanding why and how to fix it can be more challenging. In    

Read more

Introducing smolagents, a simple library to build agents

Today we are launching smolagents, a very simple library that unlocks agentic capabilities for language models. Here’s a glimpse: from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel()) agent.run(“How many seconds would it take for a leopard at full speed to run through Pont des Arts?”) Table of Contents

Read more

COâ‚‚ Emissions and Models Performance: Insights from the Open LLM Leaderboard

Since June 2024, we have evaluated more than 3,000 models on the Open LLM Leaderboard, a worldwide ranking of open language models performance. Even though we’re trying to run evaluations without wasting resources (we use the spare cycles of our cluster, in other words the GPUs which are active but waiting between jobs), this still represents quite a big amount of energy spent for model inference! In the last year, people have become more and more aware that using large […]

Read more

AI Agents Are Here. What Now?

Introduction The sudden, rapid advancement of LLM capabilities – such as writing fluent sentences and achieving increasingly high scores on benchmarks – has led AI developers and businesses alike to look towards what comes next: What game-changing technology is just on the horizon? One technology very recently taking off is “AI agents”, systems that can take actions in the digital world aligned with a deployer’s goals. Most of today’s AI agents    

Read more

Train 400x faster Static Embedding Models with Sentence Transformers

This blog post introduces a method to train static embedding models that run 100x to 400x faster on CPU than state-of-the-art embedding models, while retaining most of the quality. This unlocks a lot of exciting use cases, including on-device and in-browser execution, edge computing, low power and embedded applications. We apply this recipe to train two extremely efficient embedding models: sentence-transformers/static-retrieval-mrl-en-v1    

Read more
1 47 48 49 50 51 1,020