How to train a Language Model with Megatron-LM

Training large language models in Pytorch requires more than a simple training loop. It is usually distributed across multiple devices, with many optimization techniques for a stable and efficient training. Hugging Face 🤗 Accelerate library was created to support distributed training across GPUs and TPUs with very easy integration into the training loops. 🤗 Transformers also support distributed    

Read more

Ethics and Society Newsletter #1

Hello, world! Originating as an open-source company, Hugging Face was founded on some key ethical values in tech: collaboration, responsibility, and transparency. To code in an open environment means having your code – and the choices within – viewable to the world, associated with your account and available for others to critique and add to. As the research community began using    

Read more

SetFit: Efficient Few-Shot Learning Without Prompts

SetFit is significantly more sample efficient and robust to noise than standard fine-tuning. Few-shot learning with pretrained language models has emerged as a promising solution to every data scientist’s nightmare: dealing with data that has few to no labels 😱. Together with our research partners at Intel Labs and the UKP Lab, Hugging Face is excited to introduce SetFit: an efficient framework for few-shot fine-tuning of Sentence Transformers. SetFit achieves high accuracy with little labeled data – for example, with […]

Read more

Very Large Language Models and How to Evaluate Them

Large language models can now be evaluated on zero-shot classification tasks with Evaluation on the Hub! Zero-shot evaluation is a popular way for researchers to measure the performance of large language models, as they have been shown to learn capabilities during training without explicitly being shown labeled examples. The Inverse Scaling Prize is an example of a recent community effort to conduct large-scale zero-shot evaluation across model sizes and families to discover tasks on which larger models may perform worse […]

Read more
1 14 15 16 17 18 1,024