Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Today, the Patronus team is excited to announce the new Enterprise Scenarios Leaderboard, built using the Hugging Face Leaderboard Template in collaboration with their teams. The leaderboard aims to evaluate the performance of language models on real-world enterprise use cases. We currently support 6 diverse tasks – FinanceBench, Legal Confidentiality, Creative Writing, Customer Support Dialogue, Toxicity, and Enterprise PII. We measure the performance of models on metrics like accuracy, engagingness, toxicity, relevance, and Enterprise PII. Why do    

Read more

Patch Time Series Transformer in Hugging Face – Getting Started

In this blog, we provide examples of how to get started with PatchTST. We first demonstrate the forecasting capability of PatchTST on the Electricity data. We will then demonstrate the transfer learning capability of PatchTST by using the previously trained model to do zero-shot forecasting on the electrical transformer (ETTh1) dataset. The zero-shot forecasting performance will denote the test performance of the model in the target domain, without any training on the target domain. Subsequently, we will do linear probing […]

Read more

Constitutional AI with Open LLMs

Since the launch of ChatGPT in 2022, we have seen tremendous progress in LLMs, ranging from the release of powerful pretrained models like Llama 2 and Mixtral, to the development of new alignment techniques like Direct Preference Optimization. However, deploying LLMs in consumer applications poses several challenges, including the need to add guardrails that prevent the model from generating undesirable responses. For example, if you are building an AI tutor for children, then you don’t want it to generate toxic […]

Read more

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

We’re happy to introduce the NPHardEval leaderboard, using NPHardEval, a cutting-edge benchmark developed by researchers from the University of Michigan and Rutgers University. NPHardEval introduces a dynamic, complexity-based framework for assessing Large Language Models’ (LLMs) reasoning abilities. It poses 900 algorithmic questions spanning the NP-Hard complexity class and lower, designed to rigorously test LLMs, and is updated on a monthly basis to prevent overfitting! A Unique Approach to LLM Evaluation NPHardEval stands apart    

Read more

SegMoE: Segmind Mixture of Diffusion Experts

SegMoE is an exciting framework for creating Mixture-of-Experts Diffusion models from scratch! SegMoE is comprehensively integrated within the Hugging Face ecosystem and comes supported with diffusers 🔥! Among the features and integrations being released today: Table of Contents What is SegMoE? SegMoE models follow the same architecture as Stable Diffusion. Like Mixtral 8x7b, a SegMoE model    

Read more

From OpenAI to Open LLMs with Messages API on Hugging Face

We are excited to introduce the Messages API to provide OpenAI compatibility with Text Generation Inference (TGI) and Inference Endpoints. Starting with version 1.4.0, TGI offers an API compatible with the OpenAI Chat Completion API. The new Messages API allows customers and users to transition seamlessly from OpenAI models to open LLMs. The API can be directly used with OpenAI’s client libraries or third-party tools, like LangChain or LlamaIndex. “The new Messages API with OpenAI compatibility makes it easy   […]

Read more
1 37 38 39 40 41 1,026