X (Twitter) Facebook Pinterest LinkedIn E-mail

Oracle has unveiled Oracle Cloud Infrastructure (OCI) Zettascale10, its new generation of AI supercomputers in the cloud. The company claims these clusters connect hundreds of thousands of NVIDIA GPUs across multiple data centers, delivering multi-gigawatt capacity and reaching peak theoretical performance of up to 16 zettaFLOPS. Zettascale10 is the computing fabric underneath the flagship supercluster developed alongside OpenAI in Abilene (Texas), as part of the Stargate program.

According to Oracle, the key lies in its Oracle Acceleron RoCE (RoCEv2) low-latency GPU-GPU network architecture, combined with NVIDIA’s AI infrastructure. The goal: massive scaling, cost/performance competitiveness, better cluster utilization, and high reliability for training and inference of large-scale models.

What is OCI Zettascale10 (and where does it fit)

Performance and scale. Up to 16 zettaFLOPS (peak) and multi-gigawatt deployments of IT power housed in macro-campus environments designed for extreme density within a 2-kilometer radius, reducing GPU-to-GPU latency for large-scale training.
Cluster fabric. Oracle Acceleron RoCE prioritizes uniformly low latency and GPU-GPU bandwidth at scale, with physical and logical network planes isolated to allow traffic rerouting in case of congestion or failures without job restarts.
Collaboration with OpenAI. The architecture was developed and first deployed in Abilene (Texas) for the Stargate supercluster jointly operated by Oracle and OpenAI.
Price/performance and sovereignty. Oracle positions Zettascale10 as the foundation for industrial-scale AI, with options to operate within its distributed cloud and controls for data/AI sovereignty.

“With OCI Zettascale10, we have combined our Oracle Acceleron RoCE network architecture with NVIDIA’s next-generation AI infrastructure to deliver multi-gigawatt capacity at an unprecedented scale,”
Mahesh Thiagarajan, EVP of Oracle Cloud Infrastructure, stated.

How does it aim to achieve this: Acceleron RoCE and “wide, shallow, resilient” network design

Oracle outlines five technical pillars for Zettascale10:

Wide, shallow, resilient fabric. The GPU NIC acts as a mini-switch connected to multiple physical and logical planes that are isolated, reducing network layers, costs, and power consumption, while increasing scalability.
Enhanced reliability. Traffic migrates automatically to stable planes, avoiding reboots and checkpoint loss during long training jobs.
Consistent performance. By eliminating a network layer compared to traditional three-tier designs, it aims for more uniform GPU-GPU latency and predictability.
More efficient optics. Using Linear Pluggable Optics (LPO) and Linear Receiver Optics (LRO), it seeks to reduce network and cooling costs while maintaining 400G/800G throughput, freeing more power budget for compute.
Operational flexibility. Plane-specific maintenance and independent NOS for reducing downtime and speeding up deployment of upgrades.

Cluster size, availability, and target audience

Initial deployment target: up to 800,000 NVIDIA GPUs per cluster, with predictable performance and cost efficiency, according to Oracle.
Orders and timelines: Orders open today; availability expected in the second half of next year.
Use cases: training of large foundational models, serving, and high-performance inference at scale, consolidating AI pipelines from research to production.

“OCI Zettascale10 provides the computing fabric needed to advance the state of the art in AI and transition from experimentation to industrialized AI,”
Ian Buck, VP of Hyperscale at NVIDIA, affirmed.

Context: The race for “gigawatt-scale” AI

The industry is moving toward gigawatt data centers with hundreds of thousands of GPUs per site for training and inference of next-generation multimodal models. In this scenario, the network fabric—its latency, effective bandwidth, and reliability—determines as much as the GPUs themselves the convergence speed, total cost, and actual utilization of the cluster.

Oracle’s approach combines:

Macro-campus densification to minimize physical hops and latency.
Multi-layer network design with isolated planes and linear optics for energy efficiency.
Distributed cloud for clients requiring data sovereignty controls and model sovereignty.

What remains to be known

Exact GPU mix (generation/model) and sustained effective capacity (beyond theoretical peak).
Real-world scale metrics in production (average utilization, plane failures, job turnaround, token/image cost metrics that can be published).
Access policies (dedicated tenancy, shared bare-metal, queues) and SLA specifics per workload size.
Energy footprint and thermal efficiency measures per campus (PUE, heat management, reuse).

Oracle notes that statements about timelines, features, and pricing are indicative (disclaimers for “forward-looking statements” and “future products”).

Why it matters

If Oracle succeeds as announced, Zettascale10 will bring to the competition a zettascale AI fabric with very low GPU-GPU latency and a more resilient operation by design. For clients aiming to industrialize AI—moving from pilots to large-scale services—the combination of capacity, sovereignty, cost/performance, and operational predictability could influence market decisions where GPU availability and network fabric are bottlenecks.

Frequently Asked Questions

What exactly is OCI Zettascale10?
An Oracle cloud AI cluster architecture that aggregates hundreds of thousands of NVIDIA GPUs across multiple dense data centers in macro-campus environments, with peak performance of up to 16 zettaFLOPS and multi-gigawatt capacity.

What does Oracle Acceleron RoCE bring compared to traditional networks?
A wide and shallow fabric with isolated planes that redistributes traffic during incidents, reduces network layers (lower latency and cost), and aims for consistent performance during large-scale training.

When will it be available, and at what scale?
Oracle accepts orders now and anticipates availability in the second half of next year, with clusters of up to 800,000 GPUs launching.

What is the connection with OpenAI and Stargate?
Zettascale10 is the core fabric of the Abilene supercluster in Texas operated by Oracle and OpenAI, and they plan to continue scaling.

What advantages does it promise in cost/performance and energy?
Oracle aims for competitive cost/performance and better cluster utilization, leveraging linear optics (LPO/LRO) and network designs that reduce consumption in interconnection, thereby allowing more power to be allocated to compute.

X (Twitter) Facebook Pinterest LinkedIn E-mail