Use Hugging Face models with Amazon Bedrock

We are excited to announce that popular open models from Hugging Face are now available on Amazon Bedrock in the new Bedrock Marketplace! AWS customers can now deploy 83 open models with Bedrock Marketplace to build their Generative AI applications. Under the hood, Bedrock Marketplace model endpoints are managed by Amazon Sagemaker Jumpstart. With Bedrock Marketplace, you can now combine the ease of use of SageMaker JumpStart with the fully managed infrastructure of Amazon Bedrock, including compatibility with high-level APIs […]

Read more

LeMaterial: an open source initiative to accelerate materials discovery and research

Today, we are thrilled to announce the launch of LeMaterial, an open-source collaborative project led by Entalpic and Hugging Face. LeMaterial aims to simplify and accelerate materials research, making it easier to train ML models, identify novel materials and explore chemical spaces. ⚛️🤗 As a first step, we are releasing a dataset called LeMat-Bulk, which unifies, cleans and standardizes the most prominent material datasets, including Materials Project, Alexandria and OQMD — giving rise to a single harmonized data format with […]

Read more

Introducing the Synthetic Data Generator – Build Datasets with Natural Language

Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code. A short demo video What is synthetic data and why is it useful? Synthetic data is artificially generated information that mimics real-world data. It allows overcoming data limitations by expanding or enhancing […]

Read more

Benchmarking Language Model Performance on 5th Gen Xeon at GCP

TL;DR: We benchmark 2 representative agentic AI workload components, text embedding and text generation, on two Google Cloud Compute Engine Xeon-based CPU instances, namely N2 and C4. The results consistently shows that C4 has 10x to 24x higher throughput over N2 in text embedding and 2.3x to 3.6x higher throughput over N2 in text generation. Taking price into consideration, C4’s hourly price is about 1.3x of N2, in this sense, C4 keeps 7x ~ 19x TCO(Total Cost of Ownership) advantage […]

Read more

Welcome to the Falcon 3 Family of Open Models!

We introduce Falcon3, a family of decoder-only large language models under 10 billion parameters, developed by Technology Innovation Institute (TII) in Abu Dhabi. By pushing the boundaries of performance and training efficiency, this release reflects our ongoing commitment to advancing open and accessible large foundation models. Falcon3 represents a natural evolution from previous releases,    

Read more

Bamba: Inference-Efficient Hybrid Mamba2 Model 🐍

We introduce Bamba-9B, an inference-efficient Hybrid Mamba2 model trained by IBM, Princeton, CMU, and UIUC on completely open data. At inference time, the model demonstrates 2.5x throughput improvement and 2x latency speedup compared to standard transformers in vLLM. To foster community experimentation, the model is immediately available to use in transformers, vLLM, TRL, and llama.cpp. We also release tuning, training, and extended pretraining recipes with a stateful data loader, and invite the community to further improve this model. Let’s overcome […]

Read more

Finally, a Replacement for BERT

This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board, with a 8192 sequence length, better downstream performance and much faster processing. ModernBERT is available as a slot-in replacement for any BERT-like models, with both a base (149M params) and large (395M params) model size. Click to see how to use these models with transformers ModernBERT will be included in v4.48.0 of transformers. Until then, it requires installing transformers from […]

Read more

Visualize and understand GPU memory in PyTorch

You must be familiar with this message 🤬: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.93 GiB total capacity; 6.00 GiB already allocated; 14.88 MiB free; 6.00 GiB reserved in total by PyTorch) While it’s easy to see that GPU memory is full, understanding why and how to fix it can be more challenging. In    

Read more
1 47 48 49 50 51 1,021