X (Twitter) Facebook Pinterest LinkedIn E-mail

At the OCP Global Summit 2025, AMD publicly unveiled — in a static exhibit — its “Helios” platform, a rack-scale reference design for AI infrastructure built on the new Open Rack Wide (ORW) standard, contributed to the Open Compute Project by Meta. This announcement isn’t an isolated product: it fits into AMD’s declared strategy to bring its hardware open philosophy “from silicon to system and up to the rack,” aiming to accelerate the adoption of open, interoperable, scalable architectures in the era of gigawatt data centers.

The proposal combines AMD Instinct™ GPUs, EPYC™ CPUs, and advanced AMD Pensando™ networking on a double-width ORW chassis, prepared for the power, cooling, and maintainability demands of next-generation AI systems. Additionally, it adopts industry-leading standards: OCP DC-MHS (Data Center – Modular Hardware System), UALink (open interconnection for accelerators), and architectures from the Ultra Ethernet Consortium (UEC), supporting vertical (scale-up) and horizontal (scale-out) open factory standards. All of this features liquid cooling with quick connect couplings, standardized Ethernet cabling, and a dual arrangement to improve operational serviceability.

“Open collaboration is key to scaling AI efficiently,” emphasized Forrest Norrod, AMD’s Senior Vice President and General Manager of the Data Center Solutions Group. “With Helios, we’re turning open standards into real, deployable systems: combining AMD Instinct, EPYC, and open factories to deliver to industry a flexible and high-performance platform designed for the next generation of AI workloads.”

ORW: a “double-wide” design tailored for the coming AI

The Open Rack Wide (ORW), proposed by Meta and adopted by OCP, defines a double-width open rack optimized for the electrical and thermal needs of the new wave of AI servers. Practically, ORW extends the physical envelope to:

Host dense acceleration systems with more robust power planes.
Simplify liquid cooling (manifolds, return, quick couplings) and sustain thermal performance continuously.
Enhance serviceability (front/rear access, blind-mate, module replacement) and reduce intervention times.

AMD embraces ORW as the structural foundation of Helios and pairs it with a mature OCP catalog (power caddies, busbars, sleds, and trays modules) to establish a shared baseline that OEMs, ODMs, and hyperscalers can adopt, extend, and customize without reinventing each component from scratch.

From chip to rack: open pieces in scale-up and scale-out

In the realm of interconnection factories, Helios is designed to coexist with two major scaling paradigms:

Scale-up (closely coupled accelerators within chassis, node-level), where UALink aspires to standardize GPU-to-GPU interconnection — with low latency and high bandwidth — using coherent topologies for large-scale training and inference.
Scale-out (multiple nodes/racks networking together), where the UEC (Ultra Ethernet Consortium) promotes next-generation Ethernet (congestion control, path diversity, telemetry, NIC offloads) to transform the network into a high-performance, multi-path fabric capable of transporting AI traffic efficiently end-to-end.

The alignment with OCP DC-MHS —the modular hardware specification for data centers— adds another layer of interoperability: sleds and modules with common interfaces for CPUs, GPUs, memory, storage, and management that speed up time-to-build and reduce integration cycles.

Within this context, Helios is not just a closed product but a reference platform: a “template” for rack-scale that shortens design, validation, and deployment times, while maximizing compatibility with open ecosystems (OCP, UALink, UEC). For hyperscalers and cloud providers, this translates to lower risk of vendor lock-in, greater flexibility, and better reuse of components across generations.

Liquid cooling and service: a data center-first practical approach

The AMD design emphasizes two key operational aspects for AI:

Liquid cooling with quick disconnects
Current AI accelerators dissipate powers over 1 kW per device. Liquid cooling isn’t just aesthetic: it enables long-term frequency and reliability. Quick couplings (tool-less, drip-less) facilitate maintenance, repair, and operations (MRO) and reduce MTTR (mean time to repair).
Double width for serviceability
The ORW chassis, with its double width, creates space for cabling routes, liquid manifolds, and accessible, removable modules. This is critical when the fleet-scale demands quick and safe interventions without sacrificing density or performance.

Additionally, standardized Ethernet with multipath resiliency, aligned with best practices in stateless operation and granular telemetry — key for monitoring hotspots, losses, and queues within AI fabrics.

Why does it matter? Three insights for the ecosystem

1) A signal in favor of open standards in high-performance AI
In the race for AI infrastructure, AMD advocates for an open and standardized approach — not only in silicon but also in interconnection, chassis, and rack. For operators, this means reducing integration costs, avoiding lock-in, and speeding up deployments.

2) A “bridge” between disparate roadmaps
EPYC CPUs, Instinct GPUs, Pensando network, UALink for GPU-to-GPU, and UEC for east-west traffic create a meeting point among suppliers and generations, vital for rapid ramp-up in gigawatt data centers.

3) Operational sustainability
The rack-scale with liquid cooling and modularity aims not just for performance: it also seeks energy efficiency, improved PUE, and more sustainable lifecycles — three goals that European and North American hyperscalers are demanding to meet decarbonization and electric capacity limits.

What (and what isn’t) “Helios” today

It is a rack-scale reference design aligned with ORW (OCP), incorporating EPYC CPUs, Instinct GPUs, and Pensando networking, demonstrating how to combine DC-MHS, UALink, and UEC in a double-width rack with liquid cooling and Ethernet.
It isn’t (yet) a commercial product with specific SKU and channels; its mission is to accelerate the adoption and customization of open AI and HPC systems for OEMs, ODMs, and hyperscalers.

For manufacturers and operators, the value lies in building faster (less foundational engineering), with interoperability from day one, and maintaining the freedom to choose blocks and suppliers within the OCP ecosystem.

OCP, Meta, and the rack-first push for AI

The Open Compute Project has been promoting open hardware designs for over a decade — covering power sources, busbars, chassis, motherboards, and management. Meta’s contribution with Open Rack Wide responds to a fundamental reality: AI has stretched the thermal and power boundaries of the traditional rack. By publishing ORW and collaborating with AMD on Helios, the community amplifies the message that public standards can absorb the pressure of this new compute wave without imposing proprietary, incompatible architectures.

What’s next: large-scale deployments, telemetry, and fabric maturity

In the short to medium term, the success of proposals like Helios will depend on:

The maturity of UALink (to connect GPUs with predictable latencies) and its adoption by providers.
The adoption of UEC (Ultra Ethernet) in CLOS topologies ensuring congestion control, path diversity, and packet-level observability for AI workloads.
The quality of liquid cooling (couplings, manifolds, leak management) at fleet scale.
The automation of provisioning and MRO (maintenance, repair, and operations), critical for avoiding overburdened site reliability teams.

AMD views Helios as a cornerstone of its commitment to an open and scalable infrastructure “to meet the growing global demand for AI.”

Conclusion: open standards, common foundation, and room to differentiate

Helios isn’t just a silicon announcement; it’s the physical context for a comprehensive open stack in AI: rack, chassis, interconnection, CPU, GPU, and network ready for the ecosystem to build upon. In a landscape where new topologies and chips emerge monthly, having a shared baseline — ORW, DC-MHS, UALink, UEC — reduces friction and enables operators to differentiate where they add value: orchestration, models, data, services, and operational efficiency. When hardware ceases to be the bottleneck, software and operations reclaim prominence.

As Norrod summarized: it’s about turning standards into systems that are “deployer-ready,” with performance and flexibility for the next generation of AI workloads. The industry — and the OCP community — now have a starting point.

FAQs

What is Open Rack Wide (ORW) and why is it important for AI?
ORW is a double-width rack standard contributed by Meta to the Open Compute Project. It expands the chassis for power delivery, liquid cooling, and serviceability suitable for high-density AI systems. It’s key because it standardizes the physical envelope to deploy large-scale accelerators without relying on proprietary racks.

How do UALink and the Ultra Ethernet Consortium (UEC) fit into Helios?
UALink aims to standardize GPU-to-GPU interconnection in scale-up topologies, while UEC develops Ethernet evolutions to create a high-performance, multi-path fabric for scale-out systems. Helios is designed to support both approaches (vertical and horizontal open factories).

What does OCP DC-MHS contribute to the rack-scale design?
DC-MHS (Data Center – Modular Hardware System) introduces modular interfaces and mechanical commonalities for boards, modules, and management, accelerating integration, improving interoperability, and reducing costs/time for deployment.

Why is quick disconnect liquid cooling relevant here?
Modern GPUs can dissipate >1 kW; liquid cooling ensures sustained performance and reliability. Quick couplings (drip-less) allow interventions without draining entire circuits, lowering MTTR and improving safety.

Is Helios an AMD product or a reference design?
Today, Helios is a rack-scale reference platform aligned with ORW and open standards. Its role is to enable OEMs, ODMs, and hyperscalers to adopt and customize open AI/HPC systems faster, with built-in interoperability and serviceability.

Source: amd

X (Twitter) Facebook Pinterest LinkedIn E-mail