Very Large Language Models and How to Evaluate Them

Large language models can now be evaluated on zero-shot classification tasks with Evaluation on the Hub! Zero-shot evaluation is a popular way for researchers to measure the performance of large language models, as they have been shown to learn capabilities during training without explicitly being shown labeled examples. The Inverse Scaling Prize is an example of a recent community effort to conduct large-scale zero-shot evaluation across model sizes and families to discover tasks on which larger models may perform worse […]

Read more

Introducing DOI: the Digital Object Identifier to Datasets and Models

Our mission at Hugging Face is to democratize good machine learning. That includes best practices that make ML models and datasets more reproducible, better documented, and easier to use and share. To solve this challenge, we’re excited to announce that you can now generate a DOI for your model or dataset directly from the Hub! DOIs can be generated directly from your repo settings, and anyone will then be able to cite your work by clicking “Cite this model/dataset” on […]

Read more

Optimization story: Bloom inference

This article gives you the behind-the-scenes of how we made an efficient inference server that powers bloom. inference server that powers https://huggingface.co/bigscience/bloom. We achieved a 5x latency reduction over several weeks (and 50x more throughput). We wanted to share all the struggles and epic wins we went through to achieve such speed improvements. A lot of different people were involved    

Read more

MTEB: Massive Text Embedding Benchmark

MTEB is a massive benchmark for measuring the performance of text embedding models on diverse embedding tasks. The ๐Ÿฅ‡ leaderboard provides a holistic view of the best text embedding models out there on a variety of tasks. The ๐Ÿ“ paper gives background on the tasks and datasets in MTEB and analyzes leaderboard results! The ๐Ÿ’ป Github    

Read more

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

This tutorial assumes you have a basic understanding of PyTorch and how to train a simple model. It will showcase training on multiple GPUs through a process called Distributed Data Parallelism (DDP) through three different levels of increasing abstraction: Native PyTorch DDP through the pytorch.distributed module Utilizing ๐Ÿค— Accelerate’s light wrapper around pytorch.distributed that also helps ensure the code can be run    

Read more

Evaluating Language Model Bias with ๐Ÿค— Evaluate

While the size and capabilities of large language models have drastically increased over the past couple of years, so too has the concern around biases imprinted into these models and their training data. In fact, many popular language models have been found to be biased against specific religions and genders, which can result in the promotion of discriminatory ideas and the perpetuation of harms against marginalized groups. To help the community explore these kinds of biases and strengthen our understanding […]

Read more
1 14 15 16 17 18 1,023