RDMA Comes to Desktop: Apple Implements It in macOS 26.2… and Early Tests Signal It’s Still “Green”

For years, RDMA (Remote Direct Memory Access) has been synonymous with supercomputing and data centers: networks capable of transferring data between memories directly, with minimal overhead and latencies that, in the HPC world, make the difference between an efficient cluster and one that “drowns” in the network stack. Now, Apple has made a bold move by enabling RDMA over Thunderbolt 5 on macOS 26.2, a feature the company itself links to use cases such as distributed AI inference.

The promise, on paper, is powerful: connect multiple Macs via Thunderbolt 5 and drastically reduce communication latency compared to traditional TCP/IP approaches. In practice, early user experiences experimenting with this feature lead to a clear conclusion: the potential exists, but there are still bumps in the road.

What is RDMA and why does it matter (when it really matters)

RDMA allows two machines to exchange data without the CPU having to constantly copy buffers and without passing through much of the operating system’s “machinery.” This reduces latency and frees up CPU cycles for useful work (such as computation or GPU processing).

In data centers, RDMA is usually associated with InfiniBand or Ethernet-based variants like RoCE. In Apple’s case, the twist is in the physical medium: Thunderbolt 5, a interconnection designed for high transfer rates in peripherals, external storage, or docks… now being used as low-latency links between hosts.

Additionally, Thunderbolt 5 doubles the base bandwidth compared to Thunderbolt 4 (80 Gb/s bidirectional) and supports asymmetric “boost” modes in certain scenarios.

The key detail: MLX and the “JACCL” backend

The technical detail that has raised alarms (in a good way) is that Apple’s MLX framework for machine learning already includes a communication backend called JACCL, designed to leverage RDMA over Thunderbolt. According to Apple, this backend enables latencies an order of magnitude lower than alternatives like ring-based backends.

Put simply: Apple isn’t activating RDMA “for fun.” They are incorporating a component to make local distributed ML — with multiple Macs — meaningful beyond just demos.

Activating it isn’t “click and go”: it requires recovery mode

This is where the first cultural hurdle appears for anyone thinking of “clusters” as something automatable: it cannot be enabled remotely, not even with sudo via SSH. According to the MLX documentation, the process requires entering macOS Recovery, opening Terminal, and executing:

rdma_ctl enable

Then, reboot.

To verify, the same documentation suggests using ibv_devices, revealing another interesting layer: Apple is exposing interfaces compatible with the RDMA ecosystem’s “verbs” (common in HPC).

Topology matters: JACCL requires a fully connected mesh

This introduces a second critical point — and the most limiting for hobbyist Macs forming a “cluster”: JACCL only supports fully connected topologies. Essentially, that means a Thunderbolt cable between every pair of nodes.

In a 4-node cluster, this is feasible. Beyond that, cabling grows quickly. Moreover, there’s no clear equivalent yet of a “Thunderbolt 5 switch” designed for these setups, complicating scaling without turning the rack into a nest of cables (and failure points).

Quick table: how cabling grows in full mesh

NodesDirect connections neededPractical takeaway
21trivial
33triangle begins
46still manageable
510requires organization and discipline
721cabling becomes a project

Port limitations in real life: M4 Max vs M3 Ultra

The port limitation is not minor because full mesh requires N−1 links per node. Current configurations of Mac Studio feature M4 Max and M3 Ultra, with a varying number of Thunderbolt 5 ports depending on the model.

Practically, this often translates into a simple rule observed in field tests:

  • M4 Max (fewer Thunderbolt 5 ports): the “natural” upper limit for full mesh often hovers around 5 nodes.
  • M3 Ultra (more Thunderbolt 5 ports): supports up to 7 nodes in full mesh (each requiring 6 links).

Why “CPU jumps to 900%”: the Thunderbolt Bridge case

One of the most reported behaviors in early testing is CPU and network overload when the system enters strange bridging/retransmission states. The MLX documentation is surprisingly clear: even though Thunderbolt RDMA doesn’t use TCP/IP for communication, disabling Thunderbolt Bridge is still necessary, as well as configuring isolated local networks over links.

This fits a typical pattern in mesh topologies: if bridge interfaces remain active, loops, traffic storms, or unexpected retransmissions can occur. Result: CPU spikes, heavy traffic, and possibly the feeling that “the network crashes,” requiring local re-entry to fix.

“No documentation” (or it was missing where you expected)

Another common frustration for testers is the scattered documentation and contradictory answers from generalist guides. It’s understandable: this is a new feature, with peculiar requirements (Recovery mode, strict mesh, bridge off, isolated networks) and tools that aren’t yet part of most people’s “mental manual.”

Community projects like Exo have been used to experiment with Macs and RDMA clusters, and some published tests show strong latency improvements in distributed memory access/use over previous setups.

What technical teams can do today (without overpromising)

Beyond demos, here is a realistic checklist for developers and system admins wanting to evaluate RDMA over Thunderbolt 5 without wasting a weekend:

  1. Separate “lab” from “production”: assume this is early-stage.
  2. Plan the topology: if JACCL is desired, start with full mesh from day one.
  3. Ensure local access: activation requires Recovery; if connectivity breaks, revert to recovery mode.
  4. Set up SSH and passwordless sudo to automate deployments (MLX assumes this in its workflow with mlx.launch).
  5. Disable Thunderbolt Bridge and isolate links as a prerequisite, not just tuning.
  6. Monitor side effects: services like Universal Control/Screen Sharing can cause CPU and network noise; isolate variables to identify issues.

A strategic outlook: Apple is testing the “personal cluster” for AI

Apple is advancing toward a very specific scenario: adding memory and compute across multiple Macs for tasks that previously required data centers or massive GPU stations. Seeing this feature tied to MLX and distributed inference isn’t accidental.

The challenging part is that, for now, this leap requires an HPC mindset: strict topologies, careful configuration, and fault tolerance. But even with those challenges, the message is clear: RDMA is no longer just “a data center thing”. Apple is pushing it into the desktop realm… albeit with protective gear on.

Scroll to Top