Large-scale Near-deduplication Behind BigCode
People who are interested in document-level near-deduplication at a large scale, and have some understanding of hashing, graph and text processing. Motivations It is important to take care of our data before feeding it to the model, at least Large Language
Read moreInstruction-tuning Stable Diffusion with InstructPix2Pix
This post explores instruction-tuning to teach Stable Diffusion to follow instructions to translate or process input images. With this method, we can prompt Stable Diffusion using an input image and an “instruction”, such as – Apply a cartoon filter to the natural image. Figure 1: We explore the instruction-tuning
Read moreAudit shows that safetensors is safe and ready to become the default
Hugging Face, in close collaboration with EleutherAI and Stability AI, has ordered an external security audit of the safetensors library, the results of which allow all three organizations to move toward making the library
Read moreHugging Face Collaborates with Microsoft to launch Hugging Face Model Catalog on Azure
Today, we are thrilled to announce that Hugging Face expands its collaboration with Microsoft to bring open-source models from the Hugging Face Hub to Azure Machine Learning. Together we built a new Hugging Face Hub Model Catalog available directly within Azure Machine Learning Studio, filled with thousands of the most popular Transformers models from the Hugging Face Hub. With this new integration, you can now deploy Hugging Face models in just a few clicks on managed endpoints, running onto secure […]
Read moreMaking LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
LLMs are known to be large, and running or training them in consumer hardware is a huge challenge for users and accessibility. Our LLM.int8 blogpost showed how the techniques in the LLM.int8 paper were integrated in transformers using the bitsandbytes library. As we strive to make models even more accessible to anyone, we decided to collaborate with bitsandbytes again to allow users to run models in 4-bit precision. This includes a large majority of HF models, in any modality (text, […]
Read moreOptimizing Stable Diffusion for Intel CPUs with NNCF and 🤗 Optimum
Latent Diffusion models are game changers when it comes to solving text-to-image generation problems. Stable Diffusion is one of the most famous examples that got wide adoption in the community and industry. The idea behind the Stable Diffusion model is simple and compelling: you generate an image from a noise vector in multiple small steps refining the noise to a latent image representation. This approach works very well, but it can take a long time to generate an image if […]
Read moreIntroducing BERTopic Integration with the Hugging Face Hub
-1 language – models – model – data – based 20 -1_language_models_model_data 0 dialogue – dialog – response – responses – intent 14247 0_dialogue_dialog_response_responses 1 speech – asr – speech recognition – recognition – end 1833 1_speech_asr_speech recognition_recognition 2 tuning – tasks – prompt – models – language 1369 2_tuning_tasks_prompt_models 3 summarization – summaries – summary – abstractive – document 1109 3_summarization_summaries_summary_abstractive 4 question – answer – qa – answering – question answering 893 4_question_answer_qa_answering 5 sentiment – sentiment analysis […]
Read moreIntroducing the Hugging Face LLM Inference Container for Amazon SageMaker
This is an example on how to deploy the open-source LLMs, like BLOOM to Amazon SageMaker for inference using the new Hugging Face LLM Inference Container. We will deploy the 12B Pythia Open Assistant Model, an open-source Chat LLM trained with the Open Assistant dataset. The example covers: Setup development environment Retrieve the new Hugging Face
Read more