X (Twitter) Facebook Pinterest LinkedIn E-mail

The SC25 conference, held in St. Louis (Missouri), confirmed something many in the industry had already sensed: supercomputing is no longer just about faster processing, but about completely rethinking how data centers are designed and operated in the age of artificial intelligence. NVIDIA used the event to showcase a true technological arsenal impacting every level of the stack: GPU, CPU, DPUs, photonic networking, quantum computing, energy efficiency, and even developer desktop setups.

The core message is clear: “AI factories”—systems capable of training and deploying models with trillions of parameters—require a coherent, efficient, and increasingly intelligent architecture. And NVIDIA aims to be the de facto operating system for these factories.

SC25 Fireside Chat — NVIDIA redraws the supercomputing landscape of the AI era: from "AI factories" to the Christmas tree-sized supercomputer 6

DGX Spark: the desktop-sized supercomputer

One of the highlights of SC25 was NVIDIA DGX Spark, dubbed the “world’s smallest AI supercomputer.” It’s a desktop system offering 1 petaflop of AI performance and 128 GB of unified memory, enough to run inference on models with up to 200 billion parameters and fine-tune large models locally.

The machine is built on the Grace Blackwell architecture, combining CPU, GPU, networking, and NVIDIA AI software stack in a compact form. The result is a developer workstation that brings what once required an entire rack to a desktop.

A key aspect of DGX Spark is its use of NVLink-C2C, which provides up to 5 times more bandwidth than PCIe Gen5 between CPU and GPU. This translates into:

Reduced bottlenecks when transferring data between CPU and GPU memory.
Faster and more efficient inference on massive models.
Much more agile fine-tuning and experimentation workflows, without always relying on a central cluster.

It’s symbolic that Jensen Huang, NVIDIA’s founder and CEO, surprised attendees at SC25 by giving away several DGX Spark units and joking that they “will look great under the Christmas tree.” It’s a gesture, but also a statement: AI supercomputing is no longer a distant resource, but a tool that’s starting to come down to the individual developer’s workstation.

BlueField-4: the DPU acting as the “operating system” of AI factories

As AI clusters grow, the challenge isn’t just adding more GPUs, but properly powering and orchestrating those resources. Enter NVIDIA BlueField-4, the next-generation Data Processing Unit (DPU) designed to serve as the “operating system” controller for AI factories.

BlueField-4 combines:

A 64-core NVIDIA Grace CPU.
NVIDIA ConnectX-9 for high-speed networking.
Native integration with NVIDIA DOCA, the microservices framework for networking, storage, and security.

Its role is to offload infrastructure tasks from the CPU and GPU, including:

Networking: massive east-west traffic, real-time telemetry, network virtualization.
Storage: parallel access to structured, unstructured, and “AI-native” datasets (datasets, checkpoints, embeddings…).
Security: segmentation, zero-trust, encryption, and tenant isolation.

Major players in AI and HPC storage, such as DDN, VAST Data, and WEKA, are already building on BlueField-4 to bring data services as close as possible to GPUs:

DDN aims to maximize GPU utilization by accelerating data pipelines for large-scale training and simulations.
VAST Data focuses on smart, real-time data movement across large AI clusters.
WEKA is launching its NeuralMesh architecture on BlueField-4, running storage services directly on the DPU to simplify and speed up AI infrastructure.

In practice, BlueField-4 transforms storage and networking into performance multipliers, rather than mere “bottlenecks” that compute hardware must wait for.

Quantum-X Photonic Networks: less energy, more resilience at 800 Gb/s

If GPUs are the engines of AI factories, networks are the circulatory system. NVIDIA’s focus here is on two key challenges: energy efficiency and reliability.

The new NVIDIA Quantum-X Photonics InfiniBand CPO switches integrate optics directly into the switch (co-packaged optics), eliminating traditional pluggable transceivers—one of the recurrent weaknesses in large deployments.

NVIDIA claims that this optical integration offers:

Up to 3.5 times better energy efficiency.
10 times greater resilience, dramatically reducing optical link failures.
Continuous operation up to 5 times longer without interruption in sensitive applications.

Centers like TACC (Texas Advanced Computing Center), Lambda, and CoreWeave have announced plans to incorporate Quantum-X Photonics into their future systems, aiming to support massive AI workloads with lower operational costs and increased stability.

This complements the NVIDIA Quantum-X800 InfiniBand switches, capable of delivering 800 Gb/s throughput end-to-end, with innovations like SHARPv4 (network reduction) and FP8 support for training models with trillions of parameters while minimizing data traffic between nodes.

NVIDIA Apollo and Warp: physics enters the AI era

Supercomputing in the AI age isn’t just “more GPUs”; it’s also new models and frameworks to simulate the physical world faster and more accurately.

NVIDIA Apollo: open models for physics-based AI

At SC25, NVIDIA showcased NVIDIA Apollo, a family of open physics AI models designed to accelerate simulations in areas such as:

Electronic design automation and semiconductors.
Computational fluid dynamics (CFD).
Structural mechanics and electromagnetism.
Climate, weather, and complex phenomena modeling.

Apollo combines cutting-edge machine learning architectures—neural operators, transformers, diffusion methods—with domain-specific knowledge. NVIDIA will provide pretrained checkpoints and reference workflows for training, inference, and benchmarking, enabling companies to adapt models to their specific needs.

Major industry players such as Applied Materials, Cadence, LAM Research, Luminary Cloud, KLA, PhysicsX, Rescale, Siemens, and Synopsys are already integrating Apollo into their design and simulation pipelines.

NVIDIA Warp: CUDA performance meets Python productivity

Complementing these models, NVIDIA Warp positions itself as an open-source Python framework to accelerate computational physics and AI workloads by up to 245 times on GPUs.

Warp enables:

Writing simulation kernels in Python with a high-level syntax.
Compiling those kernels into highly optimized CUDA code.
Easily integrating simulations with pipelines like PyTorch, JAX, NVIDIA PhysicsNeMo, and NVIDIA Omniverse.

Companies such as Siemens, Neural Concept, and Luminary Cloud are already using Warp to build GPU-accelerated 3D simulation workflows that generate large-scale training and validation data for AI models. The goal is to lower the entry barrier to high-performance simulation for engineers and scientists working in Python, without requiring CUDA expertise.

NVQLink: bridging supercomputers and quantum processors

Another strategic highlight is NVQLink, a universal interconnect linking NVIDIA’s GPUs with quantum processors to build hybrid quantum-classical systems.

infiniband corp blog connectx 9 supernic c9180 — NVIDIA redraws the supercomputing landscape of the AI era: from "AI factories" to the Christmas tree-sized supercomputer 8

NVQLink promises:

Up to 40 petaflops of AI performance at FP4 precision in hybrid workflows.
Latencies on the order of microseconds, critical for quantum error correction and real-time control.
An open architecture based on CUDA-Q, allowing developers and supercomputing centers to integrate different QPUs under a unified programming model.

The most notable example is Quantinuum, whose new Helios QPU has been integrated with NVIDIA GPUs via NVQLink to achieve:

Real-time decoding of large-scale qLDPC error correction codes.
Fidelity close to 99%, compared to ~95% without correction.
Reaction time of 60 microseconds, 16 times better than Helios’s own 1-millisecond requirement.

Numerous supercomputing centers across Europe, the US, and Asia-Pacific—including JSC, CINECA, AIST G-QuAT, RIKEN, KISTI, NCHC, Pawsey—have announced plans to adopt NVQLink to advance practical quantum computing research.

Japan, Arm, and the energy efficiency race

NVIDIA’s vision isn’t limited to hardware and software; it also considers the geopolitical landscape of AI and the emerging bottleneck: energy.

RIKEN and Japan: AI for science and sovereign quantum computing

NVIDIA and RIKEN are constructing two new GPU-accelerated supercomputers in Japan with a total of 2,140 Blackwell GPUs connected via GB200 NVL4 and Quantum-X800 networks.

One system with 1,600 GPUs will focus on AI for science (life sciences, materials, climate, manufacturing, lab automation).
Another with 540 GPUs will target quantum computing, hybrid algorithms, and simulation.

These systems complement the FugakuNEXT project, co-developed by RIKEN, Fujitsu, and NVIDIA, which aims to deliver 100 times more performance than the current Fugaku and integrate production quantum computers before 2030.

Arm + NVLink Fusion: scaling CPU-GPU integration

Meanwhile, Arm is adopting NVIDIA NVLink Fusion, a high-bandwidth, coherent interconnect born from Grace Blackwell.

The goal is to connect ARM-based Neoverse CPUs with GPUs and other accelerators in a rack-scale unified architecture, eliminating memory and bandwidth bottlenecks that limit AI performance.

Since all major providers—AWS, Google, Microsoft, Oracle, Meta—are already building on Neoverse, the combination with NVLink Fusion is poised to become the de facto standard for energy-efficient AI infrastructure in upcoming years.

Domain Power Service: energy as an orchestrable resource

Finally, NVIDIA highlighted a topic that’s becoming as crucial as performance: how to power AI factories without blowing up electricity bills or overloading the grid.

The new Domain Power Service (DPS) treats electrical power as a dynamic resource that can be modeled and orchestrated just like CPU, GPU, or memory workloads. Operating as a service on Kubernetes, it can:

Model power consumption from the rack level up to entire facility.
Smartly adjust power limits to extract more performance per megawatt.
Coordinate with NVIDIA Omniverse DSX Blueprint, Power Reservation Steering, and Workload Power Profile within the DSX Boost suite to balance loads and efficiency.

DPS also exposes APIs to the electrical grid, enabling automated mechanisms for demand response and load reduction during peak times. The vision is for AI data centers to shift from being a system strain to active participants in grid stabilization.

FAQs on NVIDIA’s new supercomputing era

1. What is NVIDIA DGX Spark, and who is it for?
DGX Spark is a desktop-form-factor AI supercomputer offering 1 petaflop of performance and 128 GB of unified memory. It’s designed for developers, research teams, and companies that need to experiment, perform inference, and fine-tune models with up to 200 billion parameters locally, without always relying on a central large cluster.

2. What does NVIDIA BlueField-4 DPU bring to an “AI factory”?
BlueField-4 offloads network, storage, and security tasks from CPUs and GPUs onto a specialized DPU with a Grace CPU and ConnectX-9 networking. This frees up compute resources for training and inference, enhances security (zero-trust), and allows providers like DDN, VAST Data, and WEKA to bring data services closer to the compute layer—reducing latency and increasing GPU utilization.

3. How do NVIDIA Quantum-X Photonics switches differ from traditional InfiniBand switches?
Quantum-X Photonics integrates optics directly into the switch via co-packaged optics, eliminating pluggable transceivers. This reduces optical link failures, improves energy efficiency by up to 3.5 times, and increases resilience (10 times more), enabling large-scale AI applications to run longer with less power per transmitted bit.

4. Why is NVQLink important for practical quantum computing?
NVQLink provides a low-latency bridge between quantum processors (QPU) and NVIDIA GPUs, enabling real-time error correction and hybrid workflows. For example, Quantinuum’s Helios QPU, connected via NVQLink, achieves real-time decoding of error correction codes with fidelity near 99% and reaction times of just 60 microseconds—making practical quantum computing more feasible outside the lab.

Sources:

NVIDIA Blog – “Accelerated Computing, Networking Drive Supercomputing in Age of AI”
Public NVIDIA communications and materials on DGX Spark, BlueField-4, Quantum-X, Apollo, Warp, NVQLink, and Domain Power Service

via: blogs.nvidia

X (Twitter) Facebook Pinterest LinkedIn E-mail