The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

It has become increasingly challenging to assess whether a model’s reported improvements reflect genuine advances or variations in evaluation conditions, dataset composition, or training data that mirrors benchmark tasks. The NVIDIA Nemotron approach to openness addresses this by publishing transparent and reproducible evaluation recipes that make results independently verifiable. NVIDIA released Nemotron 3 Nano 30B A3B with an explicitly open evaluation approach to make that distinction clear. Alongside the model card, we are publishing the complete evaluation recipe used to […]

Read more

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Transformers v5 redesigns how tokenizers work. The big tokenizers reformat separates tokenizer design from trained vocabulary (much like how PyTorch separates neural network architecture from learned weights). The result is tokenizers you can inspect, customize, and train from scratch with far less friction. TL;DR: This blog explains how tokenization works in Transformers and why v5 is a major redesign, with clearer internals, a clean class hierarchy, and a single fast backend. It’s a practical guide for anyone who wants to […]

Read more

AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems

Large Language Models (LLMs) have rapidly evolved from text-only assistants into complex agentic systems capable of performing multi-step reasoning, calling external tools, retrieving memory, and executing code. With this evolution comes an increasingly sophisticated threat landscape: not only traditional content safety risks, but also multi-turn jailbreaks, prompt injections, memory hijacking, and tool manipulation. In this work, we introduce AprielGuard, an 8B parameter    

Read more

NVIDIA brings agents to life with DGX Spark and Reachy Mini

Today at CES 2026, NVIDIA unveiled a world of new open models to enable the future of agents, online and in the real world. From the recently released NVIDIA Nemotron reasoning LLMs to the new NVIDIA Isaac GR00T N1.6 open reasoning VLA and NVIDIA Cosmos world foundation models, all the building blocks are here today for AI Builders to build their own agents. But what if you could bring your own agent to life, right at    

Read more

Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture

Discover more in our official blogpost, featuring an interactive experience The journey of building world-class Arabic language models has been one of continuous learning and iteration. Today, we’re excited to announce Falcon-H1-Arabic, our most advanced Arabic language model family to date, representing a significant leap forward in both architecture and capabilities. This release embodies months of research, community feedback, and technical innovation, culminating in three powerful models that set new standards for Arabic natural language processing.

Read more

AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality

AssetOpsBench is a comprehensive benchmark and evaluation system with six qualitative dimensions that bridges the gap for agentic AI in domain-specific settings, starting with industrial Asset Lifecycle Management. Introduction While existing AI benchmarks excel at isolated tasks such as coding or web navigation, they often fail to capture the complexity of real-world industrial operations. To bridge this gap, we introduce AssetOpsBench, a framework specifically designed to evaluate agent    

Read more
1 67 68 69 70 71 1,022