X (Twitter) Facebook Pinterest LinkedIn E-mail

The artificial intelligence industry has taken a step that may seem technical but actually targets the core of future AI superclusters for training and inference. AMD, Broadcom, Meta, Microsoft, NVIDIA, and OpenAI have announced the formation of the Optical Compute Interconnect (OCI) Multi-Source Agreement (MSA), a consortium aiming to define an open optical interconnection standard for AI system “scale-up.” Put simply: they want to lay the groundwork to replace part of the copper cabling that currently connects accelerators and switches inside large racks and compute domains with optical links specifically designed for that purpose.

The significance of this announcement lies not only in the names involved but also in the problem they seek to address. As AI clusters grow, traffic between GPUs, XPUs, and internal switches becomes a major system bottleneck. For example, Broadcom has long argued that the growth of these environments is pushing copper to its limits in power consumption, reach, and density—especially in increasingly compact and demanding AI architectures. NVIDIA also emphasizes that “scale-up” within racks is a critical layer to maintain performance when dozens or hundreds of accelerators need to communicate almost as a single system.

What OCI proposes is not a new proprietary closed protocol but a common physical optical layer. The consortium states that their goal is to create an interoperable foundation where different processor designs, switches, and interconnection technologies can coexist, with a multi-vendor focus tailored to the real needs of hyperscalers. In practice, this should allow the market to be less dependent on a single supply chain or a single approach to internal connectivity in large AI systems.

From Copper to Optical Inside the Rack

Until now, optical technology has been mostly associated with “scale-out,” meaning connections between servers, racks, or entire infrastructure blocks. In contrast, “scale-up” has largely relied on very short-range electrical links to connect GPUs and switches with minimal latency. However, the growth of models and compute domains is pushing this boundary. The OCI MSA itself states that the physical limits of copper are already constraining system architecture and that migrating to optical links within this internal layer will become necessary later this decade.

The consortium’s initial roadmap envisions a cautious start but with clear ambitions. The OCI GEN1 specification features 4 wavelengths at 50 Gb/s NRZ, equating to 200 Gb/s per direction. OCI GEN2 aims to reach bidirectional speeds of 400 Gb/s per direction, or up to 800 Gb/s per fiber. Beyond that, the consortium proposes a path to increase both the number of wavelengths and signaling rates up to 3.2 Tb/s per fiber and beyond. This is not an immediate commercial速度 but a multi-generation hardware roadmap.

An important aspect is the range of formats they aim to support. The consortium mentions support for pluggable modules, on-board optics, and co-packaged optics (CPO)—a technology designed to bring optics as close as possible to silicon computational or switching components, to reduce power and improve density. Broadcom, which has been promoting this approach for years, argues that the transition to CPO will be a key factor in scaling AI clusters without dramatically increasing energy and thermal costs.

A Consortium with a Clear Industrial Message

Beyond the technical specifications, OCI MSA represents a shift in industry tone. It’s not just a group driven by networking or semiconductor manufacturers, but by a significant mix of hardware designers and large-scale AI infrastructure operators. Meta, Microsoft, and OpenAI are not just observers; they are founding members alongside AMD, Broadcom, and NVIDIA. This detail matters because it suggests that the push to redefine internal AI system connectivity is driven not only by chip and switch vendors but also by organizations deploying increasingly large and costly clusters.

The corporate messages from participants reflect this trend. AMD talks about the growing need for optical “scale-up” links for large AI systems late this decade. Microsoft emphasizes that optical technologies, protocols, and switch architectures designed for “scale-up” will be essential for building high-performance compute domains spread across multiple racks. OpenAI connects this evolution directly to increases in petaflops, memory bandwidth, and network bandwidth necessary to continue scaling AI supercomputers. While these are official statements, they collectively paint a consistent picture: the next-gen bottleneck won’t just be the accelerator but also how it’s interconnected.

It’s also noteworthy that NVIDIA joins this initiative while still promoting its own “scale-up” ecosystem with NVLink. This does not mean abandoning proprietary advantages but suggests that the industry is beginning to accept that some level of interoperability at the optical physical layer can benefit even players with distinct technologies. Tom’s Hardware interprets this as an effort to develop a common optical foundation where different interconnection protocols used by various vendors can coexist.

Why This Could Change AI Cluster Designs

If successful, the impact could go far beyond just cabling. A common, open optical layer could reduce integration risks, shorten deployment cycles, and broaden the pool of suppliers capable of building AI racks. For hyperscalers, this means more flexibility to combine compute, switching, and optics without being locked into a single closed architecture. For the supply chain, it opens the door to a larger ecosystem centered around short-range optics for AI. And for the broader market, it shifts the future of AI connectivity discussion beyond who makes the fastest GPUs to who designs the most effective internal interconnects.

However, it’s important not to overstate this. OCI MSA is just emerging; what’s been announced so far is a specification and a roadmap, not a finished product ready to transform data centers tomorrow. Its traction, the actual interoperability achieved across hardware generations, and how well it coexists with established AI ecosystems remain to be seen. But the direction is clear: optical solutions are transitioning from a mere rack-to-rack connection to an integral part of the compute domain itself. Given the continuous growth of AI clusters, this could become one of the decade’s most significant infrastructure decisions.

Frequently Asked Questions

What is OCI MSA and what is it for?

OCI MSA is a consortium formed by AMD, Broadcom, Meta, Microsoft, NVIDIA, and OpenAI to define an open optical interconnection standard aimed at “scale-up” of AI systems—meaning, internal connectivity between accelerators and switches within large compute domains.

What speeds does OCI’s roadmap aim for?

The roadmap begins with 200 Gb/s per direction in OCI GEN1 and aims to reach 800 Gb/s per fiber in initial generations, with future developments targeting 3.2 Tb/s per fiber and beyond.

Why is the industry pushing to replace copper with optics in AI “scale-up”?

Because copper is reaching limits in range, power consumption, and density in increasingly large AI clusters. Optical links offer a way to continue scaling bandwidth and distance while maintaining aggressive power and performance targets.

Does this replace technologies like NVLink or UALink?

Not necessarily. The consortium aims to build a common, interoperable physical optical layer that can serve as a basis for different designs and interconnection fabrics, not to outright eliminate proprietary protocols used by specific vendors.

via: tomshardware

X (Twitter) Facebook Pinterest LinkedIn E-mail