Broadcom Unveils Thor Ultra, the First 800G Ethernet NIC for AI: Open, UEC-Compatible, and Ready for Million-Token Scale Clusters

Broadcom has announced Thor Ultra, which it describes as the first 800G Ethernet NIC tailored for AI on the market. The launch targets a clear goal: interconnecting hundreds of thousands of XPUs (GPUs, CPUs, and similar accelerators) to train and infer models with trillions of parameters over open and standardized Ethernet networks. The strategic key is its full compliance with the Ultra Ethernet Consortium (UEC) specification, which updates RDMA to handle large-scale switching fabrics and prevents vendor lock-in through proprietary solutions.

What Thor Ultra solves: RDMA “reinvented” for massive AI

In large-scale AI networks, traditional RDMA faces limitations: lack of effective multipathing, strict in-order delivery, coarse retransmission granularity, and difficult scalability of congestion. Thor Ultra introduces, in accordance with UEC, a set of innovations to overcome these issues:

  • Packet-level multipathing to balance load across the network fabric.
  • Out-of-order delivery directly to XPU memory, which maximizes network utilization without blocking flows through reordering.
  • Selective retransmission to prevent resending healthy blocks and to accelerate Job Completion Time (JCT).
  • Programmable congestion control with algorithms on both receiver and sender to manage spikes and micro-congestion without impacting latency.

The practical outcome is greater sustained performance in AI clusters and less reliance on proprietary stacks: being UEC-compliant, customers can combine Thor Ultra with any compatible XPU, optical, or switch, from top-of-rack layers to very high-density spine switches.

Hardware and offloads: 800G with security and low-latency under control

Thor Ultra is available in standard PCIe CEM and OCP 3.0 formats, with a PCI Express Gen6 x16 host interface, employing SerDes PAM4 of 200G or 100G with support for long-reach passive copper. Broadcom emphasizes a leader in BER (Bit Error Rate) for its SerDes, which reduces link flaps and cuts down JCT during training and inference tasks.

On the security and efficiency front, the NIC features:

  • Line-level encryption/decryption with offload PSP, offloading intensive cryptographic operations from the XPU.
  • Secure boot with signed firmware and device attestation.
  • Programmable congestion channel, packet trimming, and congestion signaling (CSIG) for real-time telemetry and correction.

Open ecosystem: Tomahawk, Jericho, and Scale-Up Ethernet

Thor Ultra integrates into Broadcom’s Ethernet portfolio for AI, including: Tomahawk 6, Tomahawk 6-Davisson, Tomahawk Ultra, Jericho 4, and Scale-Up Ethernet (SUE). This combination enables 800G Ethernet fabrics with advanced telemetry, end-to-end visibility, and CSIG compatibility, across architectures ranging from Endpoint-Scheduled Ethernet to Fabric-Scheduled Ethernet as defined by UEC. For end-users, this offers design freedom (NICs, switches, optics) and a less vendor-dependent adoption curve.

Why now: scaling AI without changing the “network language”

The rise of RoCE/UEC over Ethernet networks stems from two concurrent pressures: component shortages in closed solutions and the need for operational standardization as clusters grow from dozens to hundreds of thousands of nodes. With Thor Ultra, Broadcom argues that Ethernet can support next-generation AI if RDMA adapts: fine multipathing, smart reordering, and distributed congestion control. Furthermore, support for long-reach DACs, OCP formats, and PCIe Gen6 simplifies integration into existing racks and facilitates gradual upgrades.

Use cases: from massive pre-fill to distributed decoding

Network bottlenecks evolve depending on the AI phase:

  • Training and pre-fill: the fabric demands sustained bandwidth and minimum reordering delays; packet-level multipathing prevents hotspots.
  • Inference and lengthy decoding: selective retransmission and CSIG reduce queues and stabilize p99 latencies, critical for SLA compliance and near real-time services.
  • Multi-tenant environments: line-level security, verified boot, and attestation enable strong segmentation and auditing without sacrificing throughput.

Availability and next steps

Thor Ultra is now in sampling for customers and partners. The company positions this NIC as the central component of its Ethernet-based AI factories, with a roadmap emphasizing UEC interoperability, granular telemetry, and JCT reduction at scale.


Technical summary

  • Speed: 800G Ethernet.
  • Standards: Full UEC compliance; RDMA with packet-level multipathing, out-of-order delivery, selective retransmission, and programmable congestion (sender/receiver).
  • Host interface: PCIe Gen6 x16.
  • Form factor: PCIe CEM and OCP 3.0.
  • SerDes: PAM4 200G / 100G with long-reach and low BER; passive DAC support.
  • Security: Line encryption/decryption with offload PSP, secure boot, signed firmware, and attestation.
  • Telemetry & control: Programmable congestion pipeline, packet trimming, CSIG.
  • Ecosystem: Compatible with Tomahawk 5/6, Tomahawk Ultra, Jericho 4, SUE, and UEC-compliant switches.

Key considerations for network architects and platform teams

  • Fabric design: Plan for deep ECMP leveraging packet-level multipathing and out-of-order placement to avoid reordering queues in spines and super-spines.
  • Inference SLA: Combine selective retransmission, CSIG, and programmable congestion algorithms to stabilize p95/p99 latencies in hybrid workloads (pre-fill + decoding).
  • Default security: Enable verified boot, attestation, and line encryption to isolate tenants and regulated environments without significant throughput impact.
  • Interoperability strategy: Validate optics, switches, and XPUs from different vendors under the UEC umbrella, prioritizing observability and failover convergence.

Final takeaway: with Thor Ultra, Broadcom aims to set the Ethernet NIC standard for AI at 800G and, most importantly, speed up the transition to open UEC fabrics. If Ethernet intends to be the native network for large-scale AI, it needs next-generation RDMA; this design strategy places the NIC at the heart of the fabric to reduce JCT, contain costs, and avoid lock-in.

Scroll to Top