Introducing HELMET: Holistically Evaluating Long-context Language Models

Contact: hyen@cs.princeton.edu Paper: https://arxiv.org/abs/2410.02694 Website: https://princeton-nlp.github.io/HELMET Code & Data: https://github.com/princeton-nlp/HELMET Since we first released HELMET last October, there has been more development on long-context language models than ever before, and we are thrilled to see the adoption of HELMET by the community, such as Microsoft’s Phi-4 and AI21’s Jamba 1.6. After the initial release, we have added more models to our evaluation suite and conducted additional analyses. We are excited to share our new results and present HELMET at ICLR […]

Read more

Cohere on Hugging Face Inference Providers 🔥

We’re thrilled to share that Cohere is now a supported Inference Provider on HF Hub! This also marks the first model creator to share and serve their models directly on the Hub. Cohere is committed to building and serving models purpose-built for enterprise use-cases. Their comprehensive suite of secure AI solutions, from cutting-edge Generative AI to powerful Embeddings and Ranking models, are designed to tackle real-world business challenges. Additionally, Cohere Labs, Cohere’s in house research lab, supports fundamental research and […]

Read more

PipelineRL

We are excited to open-source PipelineRL, an experimental RL implementation that tackles a fundamental challenge in large-scale Reinforcement Learning with LLMs: the trade-off between inference throughput and on-policy data collection. PipelineRL’s key innovation is inflight weight updates during RL training (see Figure 1 below). This allows PipelineRL to achieve constantly high inference throughput and minimize the lag between the weights used for rollouts and the most recently updated model weights. The result: fast and stable RL training for large language […]

Read more

What is AutoRound?

As large language models (LLMs) and vision-language models (VLMs) continue to grow in size and complexity, deploying them efficiently becomes increasingly challenging. Quantization offers a solution by reducing model size and inference latency. Intel’s AutoRound emerges as a cutting-edge quantization tool that balances accuracy, efficiency, and compatibility. AutoRound is a weight-only post-training quantization (PTQ) method developed by Intel. It uses signed gradient descent to jointly optimize weight rounding and clipping ranges, enabling accurate low-bit quantization (e.g., INT2 – INT8) with […]

Read more
1 63 64 65 66 67 1,031