NVIDIA Launches Nemotron 3 Super, Their New Open Model for Agents

NVIDIA has introduced Nemotron 3 Super, a new open model from the Nemotron family focused on agentic loads, long-context reasoning, tool usage, and high-volume enterprise deployments. The company describes it as a hybrid Mamba-Transformer MoE model with 120 billion total parameters, of which 12 billion are active during inference, and with a context window of up to 1 million tokens. The announcement took place on March 10 and 11 via NVIDIA’s research website, their developer blog, NIM, and Hugging Face.

The announcement matters because it arrives at a time when the open AI market is no longer solely about chatbots or general models, but about systems capable of planning, calling tools, maintaining context during long sessions, and operating as core components of agents. That’s where NVIDIA aims to position Nemotron 3 Super: not just as another competitor in the LLM race, but as a specifically designed piece for complex agent flows, RAG, ticket automation, programming, and extended reasoning.

A model designed for long context and real efficiency

One of the most striking features of Nemotron 3 Super is its architecture. NVIDIA explains that the model uses a LatentMoE approach that combines Mamba-2 layers, MoE, and some attention layers, along with Multi-Token Prediction (MTP) to speed up generation. The company claims that this combination improves memory and compute efficiency while maintaining advanced reasoning capabilities and support for very long contexts. In its official documentation, NVIDIA also emphasizes that this is the first “Super” version of the Nemotron 3 family to incorporate LatentMoE, MTP, and pretraining in NVFP4.

The figure of one million tokens is probably the most headline-grabbing. NVIDIA asserts that Nemotron 3 Super supports up to 1M context and, in long-context tests like RULER, surpasses open models such as GPT-OSS-120B and Qwen3.5-122B at that scale. Meanwhile, their developer blog states that this wide window aims to prevent state loss in agent systems and reduce the so-called “goal drift,” a common issue when an agent must maintain a complex task over many interactions.

This positioning is not minor. Practically, a context window this large can be especially useful in agents that need to work with extensive documentation, long histories, multiple tools, or prolonged chains of reasoning. It doesn’t necessarily mean the model will be automatically better in all scenarios, but it shows NVIDIA is targeting one of the most visible bottlenecks in current agent software: the need to hold relevant information without degrading performance or increasing operational costs.

What NVIDIA promises in performance and deployment

NVIDIA hasn’t limited itself to architecture alone. It has also published performance and efficiency comparisons to bolster the launch. On its official research page, NVIDIA states that Nemotron 3 Super attains up to 2.2 times more inference throughput than GPT-OSS-120B and up to 7.5 times that of Qwen3.5-122B in an 8K tokens input and 16K tokens output configuration, with comparable or superior accuracy on several benchmarks. These figures come from NVIDIA and should be read as results from the manufacturer rather than as independent validation. Nonetheless, they help paint a picture of how NVIDIA envisions positioning the model in the market: as an open, efficient option for large-scale agent loads.

The official specs also include practical details. NVIDIA offers variants like BF16 and FP8, with different hardware requirements. For example, the BF16 version requires at least 8 H100 GPUs with 80GB each, while the FP8 variant reduces this threshold to 2 H100 GPUs with 80GB each. It is also noted that the model supports multiple languages, including Spanish, and is designed for agent workflows, long-context reasoning, tool integration, and RAG systems.

Another detail of interest for product and infrastructure teams is the licensing. NVIDIA presents it as an “open” model under the NVIDIA Nemotron Open Model License, and both NIM and Hugging Face emphasize that it’s ready for commercial use within those terms. This combination—an open model, very long context capabilities, agent-focused, and deployable at scale—helps explain why Nemotron 3 Super could appeal to both companies and open source projects seeking alternatives to closed, large-context models.

Why it might appeal to open assistants and agent frameworks

Although the public discussion around the launch has linked it to tools like OpenClaw, the more reasonable fit isn’t tied to a single application but rather a type of system. According to its official repository, OpenClaw is an open-source personal assistant capable of functioning across multiple channels and devices. A model like Nemotron 3 Super, with its focus on agents, tool use, and extensive context, aligns well on paper with this kind of architecture, as well as with other open assistants, complex RAG systems, or multi-agent workflows. This interpretation is a reasonable inference based on the model’s public capabilities and the product description of OpenClaw.

However, it’s important not to overstate. Having a large context window or an efficient architecture doesn’t automatically make a model the best choice for all agents. Real-world deployment factors like fine-tuning quality, latency, token costs, tool invocation stability, ease of serving, and performance in specific tasks are critical. What’s clear is that NVIDIA is attempting to occupy a space previously dominated by other labs: open high-level models specifically tailored for agents.

Overall, Nemotron 3 Super is more than a catalog iteration. It signals that the open AI market is entering a new phase, where brute size no longer suffices, and efficiency, long context, tool integration, and the ability to serve as the backbone of complex agents matter much more. In this arena, NVIDIA wants to make clear they are not only selling GPUs but also competing with their own model.

Frequently Asked Questions

What is NVIDIA Nemotron 3 Super?

An open NVIDIA model oriented towards agent reasoning, tool use, RAG, and long-context tasks, with 120B total parameters, 12B active, and support for up to 1 million tokens of context.

What architecture does Nemotron 3 Super use?

NVIDIA states it uses a hybrid LatentMoE architecture combining Mamba-2, MoE, attention layers, and Multi-Token Prediction to speed inference.

How many GPUs does Nemotron 3 Super need?

It depends on the variant. The official specs indicate 8× H100-80GB for BF16 and 2× H100-80GB for FP8 as minimum requirements.

Can Nemotron 3 Super be used in open assistants like OpenClaw?

On paper, yes—since OpenClaw is an open-source assistant and Nemotron 3 Super is optimized for agents, tool use, and extended context. However, the final choice depends on actual performance, latency, cost, and system integration.

Scroll to Top