NVIDIA took the GTC 2026 as an opportunity to introduce BlueField-4 STX, a new modular reference architecture aimed at solving one of the less visible bottlenecks in modern AI: storage and context access. The proposal targets enterprises, cloud providers, and AI infrastructure operators that need to power systems capable of handling long contexts, multiple tools, and persistent sessions—features increasingly common in what is called agentic AI.
The core idea is clear: having more GPUs alone is no longer enough. According to NVIDIA, traditional data centers provide capacity but lack the responsiveness needed for AI agents to access data and contextual memory in real time without impacting inference performance. BlueField-4 STX seeks to fill this gap with an accelerated storage stack that keeps data “close” to the GPUs, reducing friction between storage, networking, and computing.
The first rack-scale deployment of this architecture is called NVIDIA CMX, a context memory storage platform that adds a high-performance contextual layer for scalable inference and agentic systems. NVIDIA claims this approach can deliver up to five times more tokens per second than traditional storage, along with up to four times better energy efficiency and twice the data ingestion speed for enterprise AI workloads.
Technically, STX relies on a new BlueField-4 storage-optimized processor, combining the Vera CPU with the ConnectX-9 SuperNIC, along with Spectrum-X Ethernet, DOCA, and NVIDIA AI Enterprise. The architecture is further integrated into the Vera Rubin platform, also introduced at GTC 2026, positioned as the new building block for NVIDIA’s future “AI factories.” Within this ecosystem, BlueField-4 STX isn’t seen as a standalone piece but as part of specialized racks that complete NVIDIA’s vision for training, agentic inference, networking, and storage.
This development aligns with a clear market evolution. Whereas in recent years the focus has been mainly on GPUs, HBM, and high-speed networking, NVIDIA now emphasizes that context memory and storage for inference are just as strategic, especially as models transition from answering simple questions to executing multi-step tasks, maintaining state, and reusing information at scale. This perspective logically follows from the company’s own discourse and from Vera Rubin’s design, which distributes AI infrastructure across racks dedicated to GPU, CPU, networking, and storage.
NVIDIA also highlighted that this is not just a roadmap but an active ecosystem. Among the early adopters of STX for contextual memory storage are names like CoreWeave, Crusoe, IREN, Lambda, Mistral AI, Nebius, Oracle Cloud Infrastructure, and Vultr. Simultaneously, storage providers such as Cloudian, DDN, Dell Technologies, Hitachi Vantara, HPE, IBM, MinIO, NetApp, Nutanix, VAST Data, and WEKA are working on this architecture, while manufacturers like AIC, Supermicro, and QCT are preparing systems based on STX.
The presence of both AI hyperscalers and specialized storage companies in this list is no coincidence. STX aims to function as a reference architecture, not a closed product. In other words, NVIDIA isn’t just selling a rack but providing a design blueprint for partners to build modular platforms suited for the inference and analysis demands of enterprise agents. This approach resembles other parts of NVIDIA’s stack: setting the technical direction, defining interconnections, and letting the industry ecosystem build around it. This interpretation is supported by the description of STX as a reference architecture and NVIDIA’s reliance on its MGX partners and the global supply chain within Vera Rubin.
For the market, this announcement also carries an important message: the AI battle is no longer solely about models but also about the entire data infrastructure surrounding them. If agents are to operate with long histories, documents, tools, operational memory, and continuous inference cycles, storage performance ceases to be a secondary concern. NVIDIA aims to position itself firmly here, expanding its reach from computational acceleration to the full architecture of AI data centers.
The first systems based on BlueField-4 STX will be available through partners in the second half of 2026, so their actual adoption will begin to be measurable in the coming months. But the strategic message is already clear: for NVIDIA, the future of agentic AI doesn’t just depend on more powerful GPUs but on rethinking from top to bottom the relationship between compute, networking, memory, and storage.

