The Chinese company is preparing to launch a new processor for training artificial intelligence models that promises to rival NVIDIA’s solutions in massive supercomputing environments.
Huawei has taken a new step in its ambition to establish itself as a key player in the artificial intelligence sector. According to information published by DigiTimes, the tech giant is finalizing the details of the Ascend 920C, a new AI accelerator that will be part of its Ascend 920 family and will provide a raw performance of over 900 TeraFLOPS in BF16 precision, designed for training large models.
The Ascend 920C will be manufactured using SMIC’s 6-nanometer process, representing a substantial improvement over the current Ascend 910C, which reaches a peak of 780 TeraFLOPS. In terms of memory, the new model will feature HBM3 modules with a bandwidth of 4,000 GB/s, improving on the 3,200 GB/s provided by the eight stacks of HBM2E in the 910C.
An evolution focused on performance and scalability
The Ascend 920C will maintain the chiplet approach but will incorporate enhancements in its acceleration engines for Transformer models and Mixture-of-Experts. Huawei’s internal projections estimate a 30% to 40% improvement in training efficiency compared to its predecessor. It is also expected to reduce the gap in performance per watt relative to its direct competitors, such as NVIDIA.
Among the highlighted features is support for PCIe 5.0 and next-generation interconnection protocols, designed to reduce latency and improve synchronization between nodes in large AI deployments. The company has not yet provided an official release date, but sources suggest that the Ascend 920C could enter mass production in the second half of 2025.
CloudMatrix 384: a supernode that already surpasses NVIDIA in scale
This announcement comes just days after Huawei revealed the results of its CloudMatrix 384, a training supernode using 384 Ascend 910C accelerators that has achieved a total performance surpassing the NVIDIA GB200 NVL72 system. While the performance per chip is lower than NVIDIA’s (900 compared to 2,500 TeraFLOPS in BF16), Huawei compensates with scalability, exceeding the overall performance by 1.7 times and multiplying the total HBM memory capacity by 3.6.
However, this capacity comes with an energy cost: Huawei’s system consumes approximately 560 kW, compared to 145 kW for NVIDIA’s system. Huawei’s strategy seems to focus on maximizing performance in large-scale deployments, while NVIDIA continues to lead in chip efficiency.
A future model based on capacity and autonomy
The key to Huawei’s success lies in its all-to-all interconnection architecture and its commitment to building complete AI solutions without relying on foreign suppliers, in line with its strategic goal of technological autonomy.
The Ascend 920C represents an important step in this direction. If SMIC advances in more advanced manufacturing processes, such as 5 or 3 nm, Huawei could close the gap with industry leaders in terms of energy efficiency and computing density.
In a context where training foundational models like GPT or Gemini increasingly demands more computational resources, solutions like the Ascend 920C and its integration into systems like CloudMatrix could be decisive for competing in the global large-scale artificial intelligence market.
The battle among giants is on, and Huawei does not seem willing to fall behind.
Source: Techpowerup