Huawei About Ascend to CUDA and Gaining Ground Against NVIDIA in China

Huawei is trying to break through one of the biggest barriers protecting NVIDIA in the AI business: not just the performance of its accelerators, but the dominance of CUDA as the de facto development environment. This is the underlying message behind the growing interest from major Chinese tech companies in the new Ascend 950PR, a chip that, according to Reuters, has started to win over clients like ByteDance and Alibaba—not so much because of raw calculation superiority, but due to a clear improvement in compatibility with the programming ecosystem many AI developers are already familiar with.

Huawei had been making progress for some time with its Ascend stack and CANN, its AI computing software platform. But the real challenge was not only building a competitive chip but also lowering the migration costs from NVIDIA. That’s where the narrative begins to shift. At HUAWEI CONNECT 2025, the company already announced that the next-generation Ascend would support both SIMD and SIMT paradigms—a move directly targeting the territory where CUDA has been dominant for years. Reuters now reports that this increased alignment with NVIDIA’s development model is one of the reasons the Ascend 950PR has garnered more interest than previous generations among Chinese hyperscalers.

This does not mean Huawei has “copied CUDA” literally, nor that the Chinese market is ready to abruptly abandon NVIDIA hardware. What appears to be happening instead is a more pragmatic approach: Huawei is bringing its development environment closer to the mental model of programmers already working with NVIDIA GPUs, reducing friction, adaptation time, and entry barriers. In a market where US export restrictions continue to complicate access to Western advanced chips, this evolution in software could be as important as the silicon itself.

The Ascend 950PR targets inference, recommendation, and large-scale deployments

From a technical perspective, Huawei had already outlined the Ascend 950PR months before its commercial release. In an official keynote in September 2025, the company explained that this chip was aimed at prefill inference scenarios and recommendation systems—two areas where parallelism, quick response, and low deployment cost are highly valued. According to Huawei, the 950PR delivers 1 PFLOPS in FP8 and 2 PFLOPS in FP4, along with 2 TB/s of interconnect bandwidth, figures that the company clearly positions above the Ascend 910C generation in internal connectivity and support for low-precision formats.

Huawei also officially revealed that the 950PR would utilize HiBL 1.0, a proprietary HBM memory designed to offer a balance between cost and performance for inference and recommendation scenarios. The company argues this approach better matches hardware investment to client needs that don’t require the exact massive training profile demanded by other deployment types. Along the same roadmap, Huawei differentiated the Ascend 950PR—more oriented to prefill and recommendation—from the future Ascend 950DT, intended for decoding and even training.

The most visible industrial piece of this strategy is the Atlas 350, a card announced by Huawei based specifically on the Ascend 950PR. In official materials, Huawei stated that this accelerator doubles vector computing capacity compared to previous generations and improves recommendation service performance by 2.5 times. Moreover, Huawei has emphasized that the Atlas 350 can be used as a standalone card or grouped in shared resource configurations—an indication that the company aims to facilitate both point deployments and scaling within larger clusters.

Software may be more valuable than a few extra teraflops

The real driver behind the Chinese market’s interest in this chip is not just the PFLOPS figure. It’s the software. Reuters reported on March 27 that ByteDance and Alibaba plan to place orders for the Ascend 950PR after confirming that this time the chip is more compatible with NVIDIA’s CUDA system and offers better response times. The agency also noted that Huawei aims to ship around 750,000 units this year, with mass production nearing launch and full deliveries expected in the second half of 2026.

This detail is key because it reflects a shift in customer attitude, not just product improvement. For years, NVIDIA’s main asset in AI has not only been hardware performance but the combination of CUDA, libraries, frameworks, tooling, documentation, and a critical mass of developers familiar with that environment. China has tried to develop its own alternatives, but many of its major tech companies still preferred NVIDIA even when access to Western hardware was more restricted. If the Ascend 950PR begins to reduce that dependency on the CUDA ecosystem, Huawei doesn’t need to match NVIDIA everywhere to become significantly more relevant in its domestic market.

Nevertheless, it’s important not to overstate this. Reuters makes it clear that the new chip doesn’t necessarily surpass NVIDIA in raw computation for all use cases, and Huawei’s official positioning suggests a more targeted strategy for specific scenarios rather than an immediate universal substitute. Additionally, the biggest bottleneck remains capacity—production scale, large-scale customer adoption, and Huawei’s ability to turn better compatibility into sustained mass deployments.

The real threat to NVIDIA is in China, not globally

The immediate impact of this development will be felt primarily within China. There, a combination of sanctions, export restrictions, political pressure to build domestic infrastructure, and difficulties in acquiring large volumes of Western GPUs has created an ideal environment for a domestic alternative to gain traction—even if it’s not superior in every aspect. Huawei seems to have understood that the best way to make progress isn’t just by brute-force hardware manufacturing but by speaking a language more familiar to AI developers.

This is precisely what threatens NVIDIA’s competitive moat in China. Not because Huawei has suddenly toppled NVIDIA’s technological advantage with CUDA, but because it is starting to make that advantage less exclusive. If major Chinese clients perceive that migrating costs are low enough, the market could begin to shift significantly compared to recent years. The Ascend 950PR, on its own, doesn’t change the entire landscape—but it could be the first Chinese chip to seriously challenge the ecosystem’s dominance, not just in hardware.

Frequently Asked Questions

What is the Huawei Ascend 950PR?
It’s Huawei’s next-generation AI accelerator, mainly aimed at inference in prefill scenarios and recommendation systems. Huawei officially introduced it as part of its Ascend 950 roadmap.

Why is the Ascend 950PR said to be closer to CUDA?
Because Huawei has enhanced CANN and the next-gen Ascend with support for paradigms like SIMD and SIMT, and Reuters reports that several clients consider increased compatibility with NVIDIA’s development ecosystem a positive feature.

Have Alibaba and ByteDance already purchased the chips?
Reuters states both companies plan to order after testing the product, but these plans are based on market sources and are not official contracts confirmed by all parties.

Can Huawei already replace NVIDIA in China?
Not entirely yet. But the Ascend 950PR shows Huawei is no longer just competing in hardware but also in software and development experience—precisely where NVIDIA maintains its huge advantage.

via: wccftech

Scroll to Top