NVIDIA Reimagines AI Inference: Large Clusters, Photonic Silicon, and Ultra-Efficient Networks for a Future Dominated by Reasoning Models

Here’s the translation:

The company bets on high-performance centralized infrastructures in response to the rise of advanced generative models and increasing inference load, while its co-packaged optics technology redefines the boundaries of energy efficiency in data centers.

For years, the narrative on how to build an AI cluster has been clear: gather as many GPUs as possible, connect them with ultra-fast networks, and feed them vast amounts of data. However, the leap from generative artificial intelligence from the training phase to its massive deployment in inference is changing the fundamental principles of data center design. And NVIDIA wants to lead this transition.

In statements to DataCenterDynamics, Kevin Deierling, senior vice president of networking at NVIDIA, explains that the era of “lightweight” inference has come to an end. Instead, the new generation of models—especially reasoning models and agent-based loads—is driving a recentralization of infrastructure around massive and increasingly efficient clusters.


Inference No Longer Means Low Consumption: Test-Time Scaling and Complex Reasoning

According to Deierling, the market has evolved into three major phases:

  1. Pre-training, focusing on developing foundational models on initial volumes of data.
  2. Post-training, where models are refined using hundreds of petabytes and even trillions of parameters.
  3. Test-time scaling, where the already trained model deploys additional computational resources during inference to simulate multiple possible outcomes and select the best response.

This last step signifies a paradigm shift: inference is no longer a straightforward query-response, but an iterative and computationally intensive process. Models like DeepSeek R1 (671 billion parameters) require dozens of GPUs in parallel, even for inference tasks, moving away from the possibility of executing them on the edge or individual devices.


Training Clusters Reused for Inference

NVIDIA has identified a clear trend among its most advanced customers: reusing training clusters for inference tasks. While it was initially thought that inference would be executed on isolated machines, it is now evident that the most economically valuable models—such as those in autonomous agents or multimodal search engines—require complex and high-density network architectures.


Co-Packaged Optics (CPO): NVIDIA’s Energy Bet

With data centers easily housing hundreds of thousands of GPUs, the main limit to scaling is no longer the hardware cost, but the energy budget. To address this, NVIDIA has invested in the integration of co-packaged optics (CPO): switches with integrated photonics directly on silicon.

Key Advantages of CPO:

  • Up to 50% less energy consumption in interconnection.
  • Massive reduction in transceivers: hundreds of thousands of external optical components are eliminated.
  • Increased operational reliability: fewer moving parts, reducing the risk of human error in high-density environments.
  • Increased capacity per rack, freeing up space and power for more GPUs.

Optical Networks Between Data Centers and Ultra-Low Latency

NVIDIA’s vision is not limited to optimizing individual racks. In the largest data centers in the world, optical interconnections between entire campuses are already being deployed, connecting multiple buildings to execute multi-cluster training tasks and distributed inference loads.

While the impact of latency on human users is limited (200 ms is tolerable), the same is not true for agent-based inference, where multiple autonomous models interact in real time. Here, sub-millisecond latency is critical, and can only be ensured within the same data center or through very low-latency optical links.


Beyond Hardware: An Architecture for the New AI

The transition from fast, simple inferences to complex and distributed reasoning processes requires reimagining the entire infrastructure stack: from networking to consumption, from optical packaging to the physical placement of racks. According to NVIDIA, future architectures will not depend on separating edge and cloud, but rather on how to manage computing, networking, and energy integrally.

Source: artificial intelligence news and DCD.

Scroll to Top