Huawei Strikes Back at NVIDIA with CloudMatrix 384: The AI Supercomputer that Challenges U.S. Hegemony

China confronts the AI race with an impressive system architecture based on its own Ascend 910C chips.

In the midst of a global GPU shortage and with the U.S. leading the development of AI models thanks to NVIDIA’s systems, China has mounted a surprisingly strong counterattack. Huawei has introduced the AI CloudMatrix 384, a supercomputing architecture for AI that, while built on less powerful individual chips, surpasses U.S. systems by scaling its infrastructure massively and efficiently.

More GPUs, more performance… though at a higher energy cost

The CloudMatrix 384 is based on the connection of 384 Ascend 910C GPUs using an all-to-all topology, allowing for direct and parallel communication between each of the accelerators. Although these chips offer only a fraction of the performance per unit compared to NVIDIA’s GB200 (used in the NVL72 system), Huawei’s design achieves superiority over its rival in overall system performance:

  • 300 PFLOPS of performance in BF16 precision (compared to 180 PFLOPS from the NVL72).
  • 3.6 times more available HBM memory.
  • 2.1 times more memory bandwidth.

The system is distributed across 16 racks, 12 of which contain GPUs and 4 intermediate racks dedicated to scalable network switches connected by over 6,900 optical transceivers at 400G. This massive scale is reminiscent of NVIDIA’s NVL256 “Ranger” platform, which was ultimately discarded due to its high cost and consumption.

Huawei wins on scale, but loses on efficiency

In terms of energy efficiency, NVIDIA maintains the advantage:

  • 2.3 times more power per FLOP than Huawei.
  • Lower consumption per TB/s of bandwidth and per TB of HBM memory.

However, this point is less critical in China, where there are no significant electricity consumption restrictions for AI data centers, unlike in the U.S. or Europe. In fact, the Asian country has added energy capacity equivalent to the entire U.S. power grid over the past decade, partly due to massive investments in sources like nuclear and coal.

Technological dependency and production limits

Although the design of the Ascend 910C chip is entirely Chinese, its manufacturing still relies on international players. Most units so far have been produced using TSMC’s 7nm lithography, and Huawei has managed to navigate some sanctions through third parties like Sophgo. This is compounded by continued access to HBM memory from Samsung, which has kept flowing through integrations with low-performance logic chips, even after export controls.

Nonetheless, China is advancing in its domestic manufacturing capability, with factories like SMIC and CXMT expanding their operations. If local production can be scaled, Huawei could multiply its available chips to power future CloudMatrix systems.

AI without limits: more scale and fewer barriers

Huawei’s approach is clear: if you can’t compete chip by chip, do it system by system. The CloudMatrix 384 is a strategic response that demonstrates how artificial intelligence is not only won with the most advanced architecture, but also with the capability to integrate, scale, and sustain large-scale operations without energy or regulatory restrictions.

In summary, Huawei has found a way to compete with American giants in AI, not from individual efficiency, but from collective brute power. While the energy cost may be high, the system is functional and can fuel China’s ambitious plans in artificial intelligence, reaffirming its determination to not fall behind in this new global technological race.

Source: Semianalysis

Scroll to Top