China Accelerates AI Accelerator Production in Local Factories: The True Bottleneck Is HBM (and Advanced Node Capacity)

The race for AI computing sovereignty has entered a new phase in China. After years of sanctions, export restrictions, and geopolitical tensions, national champions are ramping up the production of AI accelerators in domestic fabs, led by Huawei and Cambricon. Internal figures and analyst estimates suggest production could reach more than one million units annually by 2026. However, the momentum faces two major hurdles: HBM memory and advanced node manufacturing capacity.

The emerging narrative is clear: logic is no longer the sole obstacle. Even if SMIC (China’s leading foundry) and future Huawei-operated fabs improve performance (yields) and manufacturing cycles in 7 nm class processes, the shortage of HBM—high-bandwidth memory critical for training and inferring modern large language models at scale—stands out as the main limiting factor for actual accelerator supply in the short term.


China’s Turnaround: From Relying on TSMC to Forcing Domestic Manufacturing

US Department of Commerce’s entity lists and successive export control packages have compelled Chinese companies to redesign their supply maps. In 2024, according to industry sources, Huawei reportedly drew from a “die bank” previously fabricated at TSMC to navigate the transition, as SMIC matured its 7 nm class (N+2) nodes under significant pressure.

This “buffer” allowed for the sustained launch and scaling of the Ascend 910 series (B and C) while SMIC refined processes, automation, and capacity. Simultaneously, Huawei has pushed its vertical integration: from design to manufacturing, including process equipment (creating SiCarrier to develop and replicate key tools) and advanced packaging.

What Changes in 2025–2026

  • SMIC has shifted from mainly producing mobile SoCs (such as Kirin 9000S) to increasing large dies focused on AI (Ascend).
  • The cycle time at 7 nm for SMIC remains longer than TSMC’s, with intensive DUV multi-patterning, but signs of improvement are evident in volume and yields.
  • Huawei aims to operate own fabs to free capacity at SMIC and speed up iterations; if successful, this could enable higher volume and process control standards.

Physical Limit Today: HBM, Not Logic

For performance and energy efficiency, modern AI accelerators require HBM (HBM2E/3/3E…). China amassed a significant inventory of foreign stacks before new restrictions took effect at the end of 2024, but this stockpile will deplete over the coming quarters without new supplies. The strategic sector outlook is clear: without additional HBM, the number of AI ASICs assembled drops, even if logical dies are available.

Why HBM Dominates

  • Massive bandwidth per watt and socket, essential for RLHF, SFT, Mixture-of-Experts, and multimodal inference.
  • No direct substitute: GDDR/LPDDR can patch certain models or light inference but won’t scale for SOTA training and large-scale deployment.
  • The 2.5D/3D packaging (interposers, TSV, stacking) adds complexity and bottlenecks at OSAT (assembly/test companies)—another arena where China is accelerating but has yet to match global leaders.

CXMT, the “Secondary Engine” That Still Has Not Fully Taken Off

In the DRAM/HBM realm, CXMT is the main contender. The company has closed the gap on DDR5 and is pushing its roadmap towards HBM3/3E, with investments from the Big Fund and packaging alliances with JCET and Tongfu, among others. Still, scale jumps in HBM demand specialized tools, TSV processes, and a learning curve that cannot be short-circuited by decree.

Most conservative projections place CXMT’s HBM production at around 2–3 million stacks in 2026, sufficient for 250,000–400,000 Ascend 910C packages (depending on configuration), far below Huawei’s target if it has ample logic dies. In short: HBM bottleneck by 2026–2027.


SMIC: Growing Capacity, Improving Yields, Still Long Cycles

Analysts differ on figures but agree on the trend: SMIC is increasing its capacity for advanced nodes (7 nm and DUV derivatives), aiming for 45k wafers/month by late 2025, 60k in 2026, and 80k in 2027. Even with yields below TSMC’s, an annual output of several million Ascend-class dies is feasible if capacity allocation is prioritized.

The main challenge: lead times. From wafer start to final module with advanced packaging, total cycles at SMIC are estimated at double that of TSMC’s for 7 nm DUV nodes, due to intensive multi-patterning, fewer cutting-edge tools per line, and less automation. This gap is crucial when pursuing monthly scale in the hundreds of thousands or millions.


“Compute Diplomacy”: Foreign Chips “Restricted”, Offshore Leasing, and Regulatory Pressure

As China accelerates its domestic base, the international chessboard shifts:

  • Nvidia H20/H20E (versions tailored to controls) have been approved for entry into China; reports mention a B30A (Blackwell family) with more FLOPs and significant memory, fitting within regulatory thresholds — or with specific licenses.
  • Compute leasing outside China (e.g., Malaysia) enables training with state-of-the-art GPUs without physically shipping chips to the country, using disconnection as a control lever if conditions are violated.
  • The consequence: players like DeepSeek, Qwen (Alibaba), and Kimi (Moonshot) can continue training and serving better, even though full sovereignty remains distant without local HBM.

Replacing GPUs with Native ASICs? The Broader Silicon Debate

From Beijing and industry voices comes the idea to diverge from the “CUDA path” and focus on ASICs and non-Western stacks. A well-optimized ASIC for specific tasks offers real efficiency advantages; the challenge lies in ecosystem: libraries, compilers, frameworks, tools, and talent built over a decade around Nvidia/CUDA or TPU/XLA technology stacks.

Huawei’s Ascend has improved in software and operators, but industry sources indicate performance and maturity gaps compared to Hopper/Blackwell or TPU v5 in SOTA training. Without volume HBM, the architecture debate is secondary: scaling is impossible.


Looking Ahead to 2026: More Dies, Sufficient HBM?

If SMIC sustains its growth trajectory and Huawei adds own capacity, the logic bottleneck could diminish during 2026. The potential number of dies for Ascend might surpass several million annually. But without HBM (own or imported), the number of assembled modules will stay well below ambitions.

Possible scenarios include:

  1. Domestic HBM takes off (CXMT + OSAT): China could balance logic and memory, with about 0.3–0.5 million packages in 2026, accelerating in 2027 if yields and tools cooperate.
  2. SMuggling/re-exporting HBM persists: this would temporarily raise the production ceiling but is not a stable foundation.
  3. More licenses for foreign chips with extensive memory (H20E/B30A): enabling better serving and inference, relieving pressure on Ascend in certain tiers but not substituting for local HBM if self-sufficiency is the goal.

Strategic Implications

  • For China: the self-sufficiency plan requires two synchronized engineslogic (SMIC/Huawei/Cambricon) and HBM (CXMT + OSAT)—plus a competitive software stack. The memory bottleneck is immediate.
  • For the US and allies: controls over HBM and critical tools have more influence on the pace of China’s ramp-up than restrictions on logic. Coordinated control measures (US, Japan, Netherlands, Korea) are decisive to prevent “supply windows”.
  • For the market: Despite these limitations, 2026–2027 will see much more compute power in China via Ascend, licensed GPUs, and leased compute. The user experience in local AI apps will improve as service capacity increases (more memory per chip, more chips).

Conclusion

China is definitely ramping up production of AI accelerators in its domestic fabs, and can produce millions of dies if it prioritizes capacity and improves yields. However, in the near and medium term, the key to its scaling isn’t lithography but HBM. Without a robust domestic HBM ramp-up—or reliable access to foreign HBM—the number of assembled modules will fall short of ambitions. Full stack sovereignty (from silicon to tokens) will be a marathon: HBM sets the tempo.


Frequently Asked Questions (FAQ)

Why is HBM a bigger bottleneck than logic?
Because SOTA models require massive bandwidth and near-memory capacity. HBM provides both; without it, large-scale training and inference lose performance and efficiency.

Can China replace HBM with GDDR or LPDDR?
For specific use cases or light inference, it’s possible temporarily. For cutting-edge LLMs, RLHF, and multimodal workloads at scale, no: the bandwidth and topology of HBM are difficult to emulate.

Can SMIC produce enough AI chips?
Capacity and yields are improving. With 20–50k wafers/month and reasonable yields, producing millions of dies annually is feasible. The main challenge is cycle time (longer than TSMC’s) and matching those dies with HBM.

What role do Nvidia H20/H20E or a hypothetical B30A play?
They can enhance serving capacity and inference in China under licensing, and ease pressure on Huawei’s Ascend. But they don’t solve the self-sufficiency goal unless local HBM ramps up.


via: semianalysis

Scroll to Top