For years, the narrative of artificial intelligence infrastructure seemed to have a single protagonist: the GPU. But Arm is pushing a different (and, for many, inevitable) idea: the only way to truly scale AI is through complete system design, where the CPU—and increasingly also the DPU—becomes the glue that makes accelerators deliver real value.
Arm uses this argument to interpret the announcement of NVIDIA Rubin at CES 2026 as a background validation: the industry is shifting toward co-designed racks and superclusters (compute, networking, storage, and security conceived as a single product), and in this leap, Arm-based CPUs are gaining prominence as layers for orchestration, coordination, and control.
From “more GPUs” to “converged data center”
Arm summarizes the change with a compelling phrase: accelerators do the calculations, but it’s the CPUs that turn that power into usable systems, managing data flow, synchronization, isolation, and reliability at scale. In a world of larger models and, above all, more “agentic” (AI that plans, reasons, and acts chaining tools), the bottleneck is no longer just FLOPs: it’s feeding, coordinating, and protecting the AI factory.
This is where the concept of a “converged AI data center” fits: dense, modular, highly integrated infrastructures that maximize computation per square meter while also aiming to contain energy and operational costs.
Rubin: six chips, a “supercomputer” in platform format
NVIDIA presents Rubin as a platform of “extreme co-design” (co-design at its limits) among six components: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch. The goal is not just more performance but reducing time and costs in training and inference when scaled to rack level.
In its summary, NVIDIA mentions:
- Up to 10× lower cost per token in inference compared to Blackwell.
- Up to 4× fewer GPUs to train Mixture-of-Experts (MoE) models compared to the previous generation.
- An additional push with Ethernet Photonics in Spectrum-X to improve energy efficiency and availability.
Furthermore, the announcement already points to concrete deployments: from “Fairwater AI superfactories” by Microsoft (based on Vera Rubin’s NVL72 systems) to providers like CoreWeave aiming to be among the first to bring Rubin into production.
The key shift: DPU as “infrastructure server” (and storage as a competitive weapon)
An interesting point in Arm’s approach is that it’s not limited to the host CPU. It emphasizes the leap of BlueField-4: more than a ready-made network card, it acts as an infrastructure processor capable of offloading critical functions from the host.
NVIDIA, for its part, has named this idea with a specific AI storage platform: NVIDIA AI Inference Context Memory (AICON), designed to increase tokens/sec and energy efficiency, supported by BlueField-4 as a core element.
The implicit message is clear: if reasoning models and agents depend on context and memory, then the boundary between “compute” and “data” blurs. Storage ceases to be a peripheral and becomes part of the final performance.
AWS also points to the same pattern with Trainium3: integration to reduce cost per useful unit
Arm reinforces its thesis by citing AWS Trainium3: a system where accelerator, CPU (Graviton), and infrastructure components (Nitro) are conceived as a unified set.
AWS states that Trainium3 offers:
- Up to 4.4× more compute power and up to 4× better energy efficiency than Trainium2.
- 128 GB of HBM3e per chip and nearly 4× bandwidth of memory.
- Configurations at the “UltraServer” scale with dozens of chips and massive HBM memory aggregation.
Again: the message isn’t “another accelerator,” but closed platforms in themselves where each layer reduces friction, latency, and energy waste.
Quick comparative table: two paths toward “full system”
| Platform | Philosophy | Key Pieces | Scaling Approach | Main Promise |
|---|---|---|---|---|
| NVIDIA Rubin | Extreme co-design (6 chips as “one system”) | Vera CPU, Rubin GPU, NVLink 6, ConnectX-9, BlueField-4, Spectrum-6 | Rack-scale (NVL72) and superclusters | Lower cost per token and fewer GPUs for MoE |
| AWS Trainium3 | In-house silicon + vertical integration (compute + CPU + infrastructure) | Trainium3 + Graviton + Nitro | UltraServers and AWS deployment | More performance and energy efficiency per generation |
What this means for the market
- CPUs are no longer “secondary” in AI: orchestration, security, and data movement become the bottlenecks in dense racks.
- Infrastructure is “productized”: increasingly, buying AI at scale means buying complete platforms, not just individual pieces.
- Networking and storage enter the race: DPUs, NICs, and “context memory” are becoming real differentiators for agents and reasoning.
Frequently Asked Questions
What is a “converged AI data center”?
An approach where compute, networking, storage, and security are designed to operate as a single system, optimized for scalable AI with energy efficiency and operational control.
Why does Arm emphasize the CPU’s importance if the GPU does the heavy lifting?
Because at large scale, the challenge isn’t just computation but coordinating thousands of GPUs: feeding data, synchronizing tasks, isolating environments, monitoring failures, and keeping the system stable.
What role does a DPU like BlueField-4 play in AI?
It acts as an “infrastructure processor”: offloading network, security, and storage tasks from the host to free resources and improve isolation and efficiency in very large clusters.
What changes with Rubin compared to previous generations?
The bet on a six-chip co-designed platform aims to reduce inference costs and accelerate training, along with integrating new layers for agents and reasoning.
via: newsroom.arm

