Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

We are excited to officially release the integration of trl with peft to make Large Language Model (LLM) fine-tuning with Reinforcement Learning more accessible to anyone! In this post, we explain why this is a competitive alternative to existing fine-tuning approaches. Note peft is a general tool that can be applied to many ML use-cases but it’s particularly interesting for RLHF as this method is especially memory-hungry! If you want to directly deep dive into the code, check out the […]

Read more

Multivariate Probabilistic Time Series Forecasting with Informer

A few months ago we introduced the Time Series Transformer, which is the vanilla Transformer (Vaswani et al., 2017) applied to forecasting, and showed an example for the univariate probabilistic forecasting task (i.e. predicting each time series’ 1-d distribution individually). In this post we introduce the Informer model (Zhou, Haoyi, et al., 2021), AAAI21 best paper which is now available in 🤗 Transformers. We will show how to use the Informer model for the multivariate probabilistic forecasting task, i.e., predicting […]

Read more

Jupyter X Hugging Face

We’re excited to announce improved support for Jupyter notebooks hosted on the Hugging Face Hub! From serving as an essential learning resource to being a key tool used for model development, Jupyter notebooks have become a key component across many areas of machine learning. Notebooks’ interactive and visual nature lets you get feedback quickly as you develop models, datasets, and demos. For many, their first exposure to training machine learning models is via a Jupyter notebook, and many practitioners use […]

Read more

Train your ControlNet with diffusers 🧨

ControlNet is a neural network structure that allows fine-grained control of diffusion models by adding extra conditions. The technique debuted with the paper Adding Conditional Control to Text-to-Image Diffusion Models, and quickly took over the open-source diffusion community author’s release of 8 different conditions to control Stable Diffusion v1-5, including pose estimations, depth maps, canny edges,    

Read more

Ethics and Society Newsletter #3: Ethical Openness at Hugging Face

In our mission to democratize good machine learning (ML), we examine how supporting ML community work also empowers examining and preventing possible harms. Open development and science decentralizes power so that many people can collectively work on AI that reflects their needs and values. While openness enables broader perspectives to contribute to research and AI overall, it faces the tension of less risk control. Moderating ML artifacts presents unique challenges due to the dynamic and rapidly evolving nature of these […]

Read more

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Models such as ChatGPT, GPT-4, and Claude are powerful language models that have been fine-tuned using a method called Reinforcement Learning from Human Feedback (RLHF) to be better aligned with how we expect them to behave and would like to use them. In this blog post, we show all the steps involved in training a LlaMa model to answer questions on Stack Exchange with RLHF through a combination of: Supervised Fine-tuning (SFT) Reward / preference modeling (RM) Reinforcement Learning from […]

Read more
1 21 22 23 24 25 1,024