X (Twitter) Facebook Pinterest LinkedIn E-mail

The startup emphasizes its commitment to challenging NVIDIA in large-scale AI through AMD alternatives and demonstrates the ecosystem viability of ROCm.

TensorWave, an AI infrastructure company, announced the deployment of North America’s largest GPU-based training cluster, featuring 8,192 AMD Instinct MI325X accelerators with direct-to-chip liquid cooling—an unprecedented feat at this scale. This system represents both a technical milestone and a strong endorsement of AMD’s ecosystem in a market nearly monopolized by NVIDIA.

The company shared images of the dense high-density racks, connected by bright orange cooling loops, showing the system fully operational and providing on-demand cloud training capabilities for enterprise clients.

MI325X Architecture: Unmatched Power and Bandwidth

Introduced in late 2024, the AMD Instinct MI325X marked AMD’s most ambitious entry into the AI accelerator market, later succeeded by the MI350X and MI355X in June 2025. Each MI325X packs 256 GB of HBM3e memory, with a bandwidth of 6 TB/s and up to 2.6 PFLOPS of FP8 computing power, thanks to a chiplet design with 19,456 stream cores clocked at 2.10 GHz.

While AMD’s GPU scale per node can’t match NVIDIA’s H100 or H200 solutions—limited to 8 GPUs compared to their 72—TensorWave opted for a different approach: thermal density and rack efficiency.

The entire cluster offers over 2 petabytes per second of aggregate bandwidth and a theoretical peak performance of 21 exaFLOPS in FP8 precision, with sustained performance depending on model parallelization and interconnect architecture.

Direct Liquid Cooling: Key to Scaling to 1,000W per GPU

Each MI325X consumes about 1,000 watts—a figure that makes air cooling impractical at scale. To address this, TensorWave implemented direct-to-chip liquid cooling using cold plates directly mounted on each accelerator and customized coolant loops.

This setup maintains optimal temperatures without the bulky 16-pin connectors or massive fans, paving the way for future GPUs like the MI350X, expected to reach thermal design power (TDP) of up to 1,400 watts due to improvements in the CDNA 4 architecture.

Strategic Support and Long-term Vision

This deployment comes just two months after TensorWave secured a $100 million Series A funding round led by AMD Ventures and Magnetar Capital. Unlike most cloud providers building on NVIDIA hardware, TensorWave has bet on AMD—not only for cost reasons but because the ROCm (Radeon Open Compute) ecosystem is mature enough for large-scale training.

However, NVIDIA remains dominant with its CUDA ecosystem, present in giants like AWS, CoreWeave, and Microsoft Azure. Nonetheless, TensorWave’s initial success signals a shift toward diversification in AI training options at hyper-scale.

Future Outlook: MI350X, FP4, and AMD’s Consolidation

TensorWave has made clear that this is just the first phase. In the second half of 2025, they plan to incorporate MI350X GPUs, which support new precisions like FP4 and FP6, higher bandwidths, and thermal demands that can only be met through liquid cooling.

With over 8,000 AMD GPUs actively engaged in training workloads, the company positions itself as a key alternative for clients seeking cost-efficient and thermally effective solutions to NVIDIA. The project could also serve as a model for others interested in sustainable, scalable AI.

July 12, 2025

via: tomshardware

X (Twitter) Facebook Pinterest LinkedIn E-mail