Deploy models on AWS Inferentia2 from Hugging Face
AWS Inferentia2 is the latest AWS machine learning chip available through the Amazon EC2 Inf2 instances on Amazon Web Services. Designed from the ground up for AI workloads, Inf2 instances
Read moreDeep Learning, NLP, NMT, AI, ML
AWS Inferentia2 is the latest AWS machine learning chip available through the Amazon EC2 Inf2 instances on Amazon Web Services. Designed from the ground up for AI workloads, Inf2 instances
Read moreWith the speed at which the generative AI space is moving, we believe an open approach is an important way to bring the ecosystem together and mitigate potential risks of Large Language Models (LLMs). Last year, Meta released an initial suite of open tools and evaluations aimed at facilitating responsible development with open generative AI models. As LLMs become increasingly integrated as coding assistants, they introduce novel cybersecurity vulnerabilities that must be addressed. To tackle this challenge, comprehensive benchmarks are […]
Read moreThe Falcon 2 Models TII is launching a new generation of models, Falcon 2, focused on providing the open-source community with a series of smaller models with enhanced performance and multi-modal support. Our goal is to enable cheaper inference and encourage the development of more downstream applications with improved usability. The first generation of Falcon models, featuring Falcon-40B and Falcon-180B, made a significant contribution to the open-source community, promoting the
Read moreSentence Transformers is a Python library for using and training embedding models for a wide range of applications, such as retrieval augmented generation, semantic search, semantic textual similarity, paraphrase mining, and more. Its v3.0 update is the largest since the project’s inception, introducing a new training approach. In this blogpost, I’ll show you how to use it to finetune Sentence
Read moreIn this blog we will be exploring Text Generation Inference’s (TGI) little brother, the TGI Benchmarking tool. It will help us understand how to profile TGI beyond simple throughput to better understand the tradeoffs to make decisions on how to tune your deployment for your needs. If you have ever felt like LLM deployments cost too much or
Read moreEarlier this week our team detected unauthorized access to our Spaces platform, specifically related to Spaces secrets. As a consequence, we have suspicions that a subset of Spaces’ secrets could have been accessed without authorization. As a first step of remediation, we have revoked a number of HF tokens present in those secrets. Users whose tokens have been revoked already received an email
Read moreAs model sizes grow, Generative AI implementations require significant inference resources. This not only increases the cost per generation, but also increases the power consumption used to serve such requests. Inference optimizations for text generation are essential for reducing latency, infrastructure costs, and power consumption. This can lead to an improved user experience and increased efficiency in text generation tasks. Assisted decoding is a popular method for speeding up text generation. We adapted and optimized it for Intel Gaudi, which […]
Read moreAI-powered NPCs (Non-Playable Characters) are one of the most important breakthroughs brought about by the use of LLMs in games. LLMs, or Large Language Models, make it possible to design “intelligent” in-game characters that can engage in realistic conversations with the player, perform complex actions and follow instructions, dramatically enhancing the player’s experience. AI-powered NPCs represent a huge advancement vs rule-based and heuristics systems. Today, we are excited to introduce NPC-Playground, a demo created by Cubzh and Gigax where you […]
Read moreIn two short years since the advent of diffusion-based image generators, AI image models have achieved near-photographic quality. How do these models compare? Are the open-source alternatives on par with their proprietary counterparts? The Artificial Analysis Text
Read moreWe are excited to announce that the new Hugging Face Embedding Container for Amazon SageMaker is now generally available (GA). AWS customers can now efficiently deploy embedding models on SageMaker to build Generative AI applications, including Retrieval-Augmented Generation (RAG)
Read more