NVIDIA Cools Rubin CPX and Reorders Its Inference Strategy

NVIDIA seems to be rethinking the role of Rubin CPX, the inference accelerator announced in September 2025 as a key component of the upcoming Vera Rubin platform. According to industry sources cited by The Elec, the company has not placed orders or made development moves related to GDDR7 memory or the substrates needed for this product, despite initial expectations for its release in the second half of 2026.

This information does not constitute an official cancellation. NVIDIA has not publicly announced that Rubin CPX is being discontinued. However, the lack of orders for memory and substrates, along with its disappearance from the roadmaps shown at GTC 2026, fuels the interpretation that the product has been canceled, postponed, or significantly redesigned. In a company that plans its supply chain well in advance, the absence of any movement at this stage is a hard signal to ignore.

Rubin CPX had a clear purpose: to target the long-context inference market with a new architecture different from traditional training GPUs. Instead of using HBM, high-bandwidth memory that dominates the most expensive AI accelerators, NVIDIA had announced a configuration with 128 GB of GDDR7. This choice was deliberate. Inference does not always require the same extreme bandwidth as training, but it demands capacity, controlled cost, and lower power consumption per operation.

From GDDR7 to SRAM: shifting priorities

When NVIDIA introduced Rubin CPX, it was described as a new class of GPU optimized for massive context inference. The company spoke of up to 30 petaflops in NVFP4 precision, 128 GB of GDDR7 memory, and up to three times the attention capacity compared to systems like GB300 NVL72. The message was clear: the AI agent, along with multi-million token contexts and long-form applications, would need a dedicated accelerator to handle the initial context processing phase.

The plan made sense on paper. In an disaggregated inference architecture, one part of the system handles input context processing while another focuses on token generation. Rubin CPX was expected to handle the first, calculation- and memory-intensive phase using GDDR7 as a more affordable and scalable alternative to HBM.

The problem is that the market has moved swiftly. At GTC 2026, NVIDIA highlighted Groq 3 LPX, a low-latency solution based on LPUs with integrated SRAM, as a key inference technology. NVIDIA’s official website now features Groq 3 LPX as the inference accelerator for Vera Rubin, designed for low latency, large context windows, and high token throughput in multi-agent systems.

ProductMain MemoryFocusPublic Status
Rubin CPX128 GB GDDR7Long-context inference and initial context processingAnnounced in 2025, absent from GTC 2026
Rubin GPUHBM4Main compute for Vera RubinCore component of the platform
Groq 3 LPU / LPXSRAM + DDR5 in rackUltra-low latency inference and high throughputPromoted by NVIDIA for Vera Rubin
GB300 NVL72HBM3E / BlackwellLarge-scale training and inferencePrevious reference platform

The technical difference is significant. GDDR7 is cheaper and easier to source than HBM but remains external memory with higher latency than integrated SRAM. Groq 3 LPU relies on a much smaller memory—500 MB of SRAM per unit—with extremely high bandwidth, 150 TB/s. An LPX rack contains 256 LPUs, totaling 128 GB of SRAM, with 12 TB of DDR5 and 40 PB/s of SRAM bandwidth per rack, as reported by NVIDIA.

In agent inference, where many systems communicate, produce tokens with low latency, and support multi-agent interactions, this architecture may be more attractive than one based on GDDR7. Not necessarily for all applications, but for real-time inference demanding low latency, high throughput, and efficiency, NVIDIA’s strategy seems to favor SRAM-based solutions like Groq 3 LPX, as the next frontier of “AI factories”.

Supply chain signals

The most revealing aspect of The Elec report concerns the supply chain. Sources indicate that there have been no orders or development requests for GDDR7 memory linked to Rubin CPX, nor any activity with substrates. An industry memory supplier mentioned that NVIDIA initially indicated Rubin CPX would use GDDR7, but now there are no active conversations about such a project.

For memory and substrate manufacturers, Rubin CPX could have opened a new market. Currently, GDDR7 is mainly used in high-end graphics cards like GeForce RTX 5090 and 5080. An inference accelerator aimed at data centers would significantly expand its application, with high-volume orders and a broader role beyond gaming and workstations.

If Rubin CPX is sidelined, that opportunity is delayed. The memory industry had hoped that GDDR7 would find a broader role in AI, serving as an intermediate option between conventional DRAM and HBM. The product’s disappearance from NVIDIA’s visible roadmap diminishes that expectation, at least in the short term.

ImplicationWho it affects
Reduced GDDR7 demand in AIMemory manufacturers
Lower volume for associated substratesAdvanced packaging and PCB suppliers
Increased focus on SRAM and LPUs for inferenceNVIDIA and the Groq ecosystem
Less dependence on a second type of massive memoryVera Rubin platform planning
Potential future redesign of CPXCustomers expecting a GDDR7 path

The industry logic is straightforward: when no memory or substrate orders appear near a planned launch window, the product’s future is uncertain. Internal versions, redesigns, customer-driven changes, or parts outside the public roadmap are possible, but the original plan is now in doubt.

Inference: the new battleground

The possible withdrawal or revision of Rubin CPX reflects a broader shift in the AI market. During the initial boom, the focus was on training gigantic models, dominated by HBM-based GPUs. Now, attention is shifting to inference: running models for millions of users, agents, assistants, search engines, voice, video, and enterprise automation.

Inference has a different economic model. It’s not enough to have the most powerful chip; it must deliver cheap, fast, and low-latency tokens. A brilliant model that responds sluggishly, consumes too much energy, or doesn’t scale economically becomes a business problem.

Therefore, NVIDIA is expanding its architecture. Vera Rubin is no longer just a more powerful GPU. It’s a multi-chip platform with the Vera CPU, Rubin GPU, NVLink, BlueField, ConnectX, Spectrum-X, and now Groq 3 LPX as a specialized inference accelerator. The company aims to maintain control over the entire system, even as the bottleneck shifts from training to real-time token generation.

This move also responds to competitive pressure. Companies like Cerebras, before their technological integration with NVIDIA, specialized ASICs for hyperscale AI, and new architectures have argued that traditional GPUs are not always the best solution for low-latency inference. NVIDIA appears to have incorporated some of these critiques, not abandoning GPUs but surrounding them with complementary accelerators.

Implications for clients and competitors

For large cloud clients, the uncertainty around Rubin CPX could have practical effects. Some inference architectures had begun to consider separating chips optimized for context processing and generation. If CPX is delayed or disappears, plans might shift toward LPX, standard Rubin, or other internal accelerators.

For competitors, the message is mixed. On one hand, the potential cancellation of Rubin CPX shows how even NVIDIA adjusts its roadmap when market conditions change or alternative technologies fit better. On the other, integrating Groq 3 LPX into Vera Rubin reinforces NVIDIA’s ability to absorb or incorporate ideas that could threaten its dominance.

For memory suppliers, the news is less encouraging. AI has driven up demand for HBM, but GDDR7 needed a clear entry point into data centers to justify broader adoption. Rubin CPX seemed to provide that opportunity. If it’s sidelined, GDDR7 will continue to grow mainly in high-end graphics, but its role in large-scale AI inference will take longer to materialize.

NVIDIA might be making a pragmatic choice. Instead of supporting two parallel inference paths—one based on GDDR7 and another on SRAM with LPUs—it appears to prioritize the approach that best supports low latency, multi-agent systems, and performance per watt. If this bet pays off, Rubin CPX will be seen as an aborted transition. If not, NVIDIA may revisit the concept later with different solutions, memory types, or generations.

The inference market has just entered a more challenging phase. It’s no longer just about running models; it’s about doing so cost-effectively, with low latency and high efficiency, enabling agents to operate at scale. Rubin CPX was supposed to be a solution, but Groq 3 LPX now seems to occupy that space.

Frequently Asked Questions

Has NVIDIA officially canceled Rubin CPX?
No. NVIDIA has not announced an official cancellation. The uncertainty arises from its absence in the GTC 2026 roadmap and the lack of memory or substrate orders cited by industry sources.

What was Rubin CPX?
It was an inference GPU announced by NVIDIA for long-context workloads, featuring 128 GB of GDDR7 memory and up to 30 petaflops in NVFP4 precision.

Why is the shift toward Groq 3 LPX significant?
Because LPX uses LPUs with ultra-low latency SRAM, offering an architecture more focused on agent inference, high token volumes, and real-time multi-agent systems.

What impact does this have on GDDR7?
If Rubin CPX doesn’t reach the market as planned, GDDR7 misses an important opportunity to expand beyond high-end graphics into AI data center applications.

Scroll to Top