TSMC accelerates the AI memory race: customized HBM4E, 3 nm logic, and double energy efficiency

TSMC took advantage of its Open Innovation Platform (OIP) forum, held recently in Amsterdam, to send a clear message to the market: the next big battle in artificial intelligence won’t be fought solely on GPUs but in high-bandwidth memory. The Taiwanese company detailed its strategy for the upcoming HBM4/HBM4E generation, featuring a new custom “C-HBM4E” that combines a base die built on an N3P (enhanced 3 nm node) process with much tighter integration between logic and DRAM.

According to data compiled by TrendForce and presented by TSMC, the ambitious goal is to double energy efficiency compared to current DRAM processes, while reducing operating voltage to as low as 0.75 V in the most advanced configurations.


Standard HBM4: N12 base die and 50% more efficiency

Until now, memory manufacturers (Micron, Samsung, SK hynix) designed and manufactured the base die for HBM themselves, using traditional DRAM processes. With HBM4, TSMC is changing the rules: it will offer standard base dies fabricated on its N12 logic node, a significantly more advanced process than that used for HBM3E.

This process leap allows voltage to drop from 1.1 V to 0.8 V, which, according to the data presented, results in an improvement in efficiency close to 1.5× over the previous generation. In practice, this means less heat per transferred bit and more margin to increase frequencies and bandwidth without significantly raising power consumption.

For memory manufacturers, the model is clear: they can focus on stacking DRAM layers and leave TSMC to handle the base logic and standard PHY for HBM4, reducing complexity and leveraging cutting-edge logic nodes.


C-HBM4E: N3P logic, 0.75 V, and integrated controller within the stack

The truly disruptive step arrives with C-HBM4E (Custom HBM4E), TSMC’s customized variant for the second wave of products, starting around 2027. In this case, the base die jumps to the N3P (high-performance 3 nm) node and voltage drops even further, from 0.8 V to 0.75 V.

TSMC states that, by combining this node with a new logic design, the C-HBM4E solution can deliver up to 2× improved energy efficiency compared to the DRAM processes powering current HBM3E. This is particularly relevant as many AI data centers hit physical power limits in their facilities.

Furthermore, in C-HBM4E, the base die does not merely handle signals: it directly integrates memory controllers, which today reside in the connected SoC (GPU, TPU, or dedicated accelerator). This turns the base die into a much more complex logical block and transforms the PHY into a fully customized solution tailored to each client’s needs.

For large chip designers, this opens the door to configurations where part of the memory management intelligence shifts into the HBM stack itself, freeing die area on the main chip and reducing signal path lengths, leading to benefits in latency and power consumption.


Micron and SK hynix Join In: TSMC is the sole provider for HBM4E base die

TSMC’s new strategy isn’t just a PowerPoint exercise: it already has names and dates. During its September earnings report, Micron confirmed it will rely on TSMC to manufacture the logic base die for its HBM4E memories, in both standard and customized variants, with volume production expected in 2027.

TrendForce and other Asian media report that SK hynix is also preparing its first personalized HBM4E products with TSMC as its foundry partner. For general-purpose server lines, SK hynix plans to use a 12 nm class process, while the “premium” versions, aimed at NVIDIA’s top-tier GPUs and Google’s TPUs, will transition to 3 nm nodes.

The pattern is clear: the big three HBM players (Samsung, SK hynix, and Micron) are sharing the stack-based DRAM business, while TSMC is establishing itself as the almost unavoidable provider of base logic and advanced packaging. It’s a sort of “infinite AI compute loop,” where the Taiwanese foundry is gaining prominence even in a market historically dominated by memory companies.


CoWoS-L: up to 12 stacks of HBM3E/HBM4 for accelerators in 2026–2027

Another key aspect of TSMC’s announcement is in packaging. The company revisited the evolution of its CoWoS (Chip on Wafer on Substrate) family:

  • CoWoS-S launched in 2016 with a 1.5× reticle limit and 4 HBM stacks on a N16 node; today, it has extended to 3.3× the reticle limit with up to 8 HBM stacks on N5/N4 nodes.
  • CoWoS-R introduces faster interconnects and support for N3 chips.
  • The new CoWoS-L generation targets a 5.5× reticle limit, about 4,500 mm² effective area, supporting up to 12 HBM3E/HBM4 stacks in a single package, aimed at 2026 AI accelerators like AMD Instinct MI450X or NVIDIA’s Vera Rubin platform.

By 2027, TSMC discusses a CoWoS-L version based on the A16 node with a 9.5× effective reticle limit and more than 12 stacks of HBM, intended for the next generation of accelerators with HBM4E and even more demanding memory and bandwidth configurations.

Meanwhile, technologies like InFO (Integrated Fan-Out) and SoW (System on Wafer) are reserved for more specific cases, such as Cerebras’ wafer-scale chips, while SoIC (3D stacking) allows stacking of SRAM or logic chiplets in 3D with bump pitches as small as 5–6 μm and hundreds of millions of microbumps per package.


3Dblox and the challenge of designing chips with 100 million microbumps

This kind of 2.5D and 3D packaging directly impacts the complexity of physical design. TSMC mentions packages that already surpass 100 million microbumps: CoWoS-S is around 15 million, CoWoS-L approaches 50 million, and solutions like SoW reach 400 million. The bump pitch is shrinking from about 9 μm down to around 5 μm in advanced chiplets.

To manage this complexity, the foundry has developed 3Dblox, a description language that allows the hierarchical definition of chiplets, interposers, and substrates, verifying interfaces (including millions of microbumps) once and reusing these verified blocks across multiple designs. This way, changes in floorplan or chiplet topology do not require re-verifying everything from scratch—crucial in preventing the design cycle from elongating further, especially given already lengthy cycles.


Implications for AI data centers and energy consumption

The core message from TSMC is clear: energy efficiency is becoming the bottleneck for large-scale AI. It’s not enough to launch more powerful chips; significant reductions in watts per terabyte/second of memory bandwidth are essential.

If standard N12-based HBM4 base dies improve efficiency by 50%, and C-HBM4E variants on N3P approach an additional 2× boost over HBM3E, large data centers operators could potentially save multiple megawatts per training cluster in the next wave of systems—just by optimizing memory architecture.

Simultaneously, integrating controllers and specialized logic into the base die grants GPU and accelerator designers (AMD, NVIDIA, Google, etc.) more room to dedicate die area to computation rather than memory management, while maintaining or even reducing overall thermal envelopes.

Looking ahead to 2026–2027, systems like AMD Instinct MI400 with 432 GB of HBM4 and nearly 20 TB/s bandwidth or NVIDIA’s Vera Rubin platform featuring HBM4 and CoWoS-L packaging will rely heavily on HBM4/HBM4E + CoWoS-L packaging + TSMC’s N12/N3P base dies as a foundation for the next generation of superclusters for AI.


FAQs about C-HBM4E, HBM4, and TSMC’s role

What exactly is TSMC’s C-HBM4E?
C-HBM4E (Custom HBM4E) is TSMC’s proposed HBM4E memory stack variant: a memory stack where the base die is fabricated on the N3P node, integrates the memory controller, and uses a fully customized PHY. Compared to standard HBM4, it offers lower voltage (0.75 V) and approximately double the energy efficiency over the DRAM processes used in HBM3E.

How does standard HBM4 differ from customized HBM4E (C-HBM4E)?
Standard HBM4 will use TSMC’s logical base dies built on N12, with around 0.8 V voltage and a standardized PHY, making it easier for multiple memory vendors to adopt. C-HBM4E moves to 3 nm (N3P), integrates the memory controller into the stack itself, and allows for tailored designs per customer, with voltage dropping further to 0.75 V and offering more optimization margin for demanding AI workloads.

What benefits do Micron and SK hynix gain by outsourcing the base die to TSMC?
By externalizing the base die logic to TSMC, Micron and SK hynix can focus on what they do best: developing dense, reliable DRAM. They gain access to much more advanced logic nodes (N12, N3P) without the need for in-house investment. Additionally, they can offer their customers standard and customized HBM4E options with tighter integration between memory and AI accelerators.

Why is CoWoS-L so crucial for 2026–2027 AI accelerators?
CoWoS-L enables massive packages (up to 5.5× the reticle limit in 2026 and 9.5× in 2027) with up to 12 HBM3E/HBM4 stacks, tens of millions of microbumps, and several chiplets. This combination makes possible memory capacities like 432 GB of HBM4 and nearly 20 TB/s bandwidth per GPU in systems such as AMD’s MI400 or NVIDIA’s Vera Rubin. Without this advanced packaging, deploying such large memory configurations close to compute elements while maintaining efficiency would be nearly impossible.


Sources: TrendForce, HardwareLUXX, Tom’s Hardware, Korea Economic Daily, Korea Financial Times, Wccftech, Geeknetic, Profesional Review, Phoronix.

Scroll to Top