Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. Meta Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Some of its key features include: a large context length of 128K tokens (vs original 8K), multilingual capabilities, tool usage capabilities, and a more permissive license. In this blog […]

Read more

The 5 Most Under-Rated Tools on Hugging Face

tl;dr The Hugging Face Hub has a number of tools and integrations that are often overlooked that can make it easier to build many types of AI solutions The Hugging Face Hub boasts over 850K public models, with ~50k new ones added every month, and that just seems to be climbing higher and higher. We also offer an Enterprise Hub subscription    

Read more

Scaling robotics datasets with video encoding

Over the past few years, text and image-based models have seen dramatic performance improvements, primarily due to scaling up model weights and dataset sizes. While the internet provides an extensive database of text and images for LLMs and image generation models, robotics lacks such a vast and diverse qualitative data source and efficient data formats. Despite efforts like Open X, we are still far from achieving the scale and diversity seen with Large Language Models. Additionally, we lack the necessary […]

Read more

Hugging Face partners with TruffleHog to Scan for Secrets

We’re excited to announce our partnership and integration with Truffle Security, bringing TruffleHog’s powerful secret scanning features to our platform as part of our ongoing commitment to security. TruffleHog is an open-source tool that detects and verifies secret leaks in code. With a wide range of detectors for popular SaaS and cloud providers, it scans files and repositories for    

Read more

Accelerate 1.0.0

3.5 years ago, Accelerate was a simple framework aimed at making training on multi-GPU and TPU systems easier by having a low-level abstraction that simplified a raw PyTorch training loop: Since then, Accelerate has expanded into a multi-faceted library aimed at tackling many common problems with large-scale training and large models in an age where 405 billion parameters (Llama) are the new language model size. This involves: A flexible low-level training API, allowing for training on six different hardware accelerators […]

Read more

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

As Large Language Models (LLMs) grow in size and complexity, finding ways to reduce their computational and energy costs has become a critical challenge. One popular solution is quantization, where the precision of parameters is reduced from the standard 16-bit floating-point (FP16) or 32-bit floating-point (FP32) to lower-bit formats like 8-bit or 4-bit. While this approach significantly cuts down on memory usage and speeds up computation, it often comes at the expense of accuracy. Reducing the precision too much can […]

Read more

Optimize and deploy models with Optimum-Intel and OpenVINO GenAI

Deploying Transformers models at the edge or client-side requires careful consideration of performance and compatibility. Python, though powerful, is not always ideal for such deployments, especially in environments dominated by C++. This blog will guide you through optimizing and deploying Hugging Face Transformers models using Optimum-Intel and OpenVINO™ GenAI, ensuring efficient AI inference with minimal dependencies. Table of Contents Why Use OpenVINO™ for Edge Deployment Step 1: Setting Up the Environment Step 2: Exporting Models to OpenVINO IR    

Read more

Exploring the Daily Papers Page on Hugging Face

In the fast-paced world of research, staying up-to-date with the latest advancements is crucial. To help developers and researchers keep a pulse on the cutting-edge of AI, Hugging Face introduced the Daily Papers page. Since its launch, Daily Papers has featured high-quality research selected by AK and researchers from the community. Over the past year, more than 3,700 papers have    

Read more

Llama can now see and run on your device – welcome Llama 3.2

Llama 3.2 is out! Today, we welcome the next iteration of the Llama collection to Hugging Face. This time, we’re excited to collaborate with Meta on the release of multimodal and small models. Ten open-weight models (5 multimodal models and 5 text-only ones) are available on the Hub. Llama 3.2 Vision comes in two sizes: 11B for efficient deployment and development on consumer-size GPU, and 90B for large-scale applications. Both versions come in base and instruction-tuned variants. In addition to […]

Read more
1 46 47 48 49 50 1,024