X (Twitter) Facebook Pinterest LinkedIn E-mail

The race for Artificial Intelligence is not only fought on GPUs. It’s also decided at a quieter but increasingly critical layer: storage. As models grow larger, AI agents gain autonomy, and companies begin building infrastructures designed to continuously produce and consume tokens. Data centers need to move, query, and store vast amounts of information without driving up energy costs or leaving accelerators idle.

Kioxia is positioning itself precisely at this junction. The Japanese company, heir to Toshiba’s memory business, has recently introduced a portfolio of technologies combining ultra-high-capacity SSDs, low-latency flash memory, software for vector searches, and new approaches to connect storage with GPUs. The overarching idea is clear: NAND should no longer be seen solely as a cheap data storage layer but as an active component of AI architecture.

This approach arrives at a delicate time for data center design. HBM memory is essential for maximizing GPU performance but is expensive, energy-intensive, and physically limited in capacity. System DRAM offers more space but still cannot scale at the pace demanded by generative AI. In this gap, enterprise SSDs are beginning to gain value as extended memory, near-compute caches, and support for vector databases, RAG, inference, and training.

245 TB SSDs for Data Lakes and Vector Databases

One of Kioxia’s most striking products is the LC9 series, an enterprise NVMe SSD that reaches 245.76 TB capacity. They present it as the first NVMe SSD with this capacity, designed in a 2.5-inch form factor with EDSFF E3.L, PCIe 5.0 interface, and BiCS FLASH QLC 3D memory stacked with 32 dies.

Such a unit isn’t meant for consumer PCs or even conventional servers. It’s intended for data lakes, training repositories, content libraries, search systems, and vector databases that need to store massive quantities of data close to compute infrastructure. In generative AI environments, capacity alone isn’t enough: data must be accessible with sufficient performance so that GPUs don’t waste time waiting.

The leap from traditional HDDs isn’t just about speed. It also impacts rack density, per-terabyte power consumption, and operational complexity. Fewer physical units mean fewer trays, fewer failure points, and easier management—though the initial cost per terabyte remains a key consideration in many architectures. Therefore, high-capacity SSDs don’t immediately replace HDDs, but they shift the conversation when latency and performance per watt are critical factors.

Kioxia views the LC9 within a broader trend: transforming flash storage into a component closer to data processing. Instead of constantly moving data from large remote repositories to compute systems, some data can stay in nearby SSDs, reducing network access, improving data availability, and enabling more balanced infrastructures.

XL-FLASH and Ultra-IOPS SSDs to Power GPUs

The second focus is less on capacity and more on latency. Kioxia has showcased solutions based on XL-FLASH, a form of Storage Class Memory aiming to bridge DRAM and conventional NAND. It doesn’t compete on density but offers faster access and far more I/O operations per second.

At GTC 2026, Kioxia revealed developments geared toward what they call AI-ready storage, including an SSD emulator capable of surpassing 100 million IOPS and solutions designed to operate close to GPUs. The company has also collaborated with NVIDIA on designs to reduce the bottleneck between storage and accelerators, a concern increasingly evident in large-scale inference and systems querying massive models or databases.

The most interesting technical aspect is the type of access. Many AI workloads don’t just require large sequential transfers; they also perform small, frequent, random reads—parameters, vectors, context fragments, or data retrieved for RAG systems. In such scenarios, measuring only gigabytes per second isn’t sufficient. IOPS, latency, and the ability to serve concurrent requests become equally important.

Kioxia isn’t alone in exploring these avenues. The entire sector is investigating ways to bring storage closer to compute: direct access from GPUs, DPUs, CXL, persistent memories, specialized caches, and more advanced packaging. The Japanese company advocates combining low-latency flash, new controllers, and architectures that enable GPUs to access data with less CPU intervention.

AiSAQ: Searching in SSDs Without Loading All Data into DRAM

Software advancements are gaining relevance as well. Kioxia has launched AiSAQ, an open-source technology for approximate nearest neighbor searches optimized for SSDs. Its goal is to reduce the load on DRAM in RAG systems, where vector databases often require large indices loaded into memory for quick responses.

AiSAQ aims to let parts of these indices be searched directly on the SSD, avoiding the need to load everything into DRAM. This can enable scaling larger vector databases at a lower memory cost, though final performance depends on the SSD used, index size, required accuracy, and application design.

This approach addresses a common challenge for many companies: deploying RAG in production isn’t just about connecting a language model to corporate documents. It involves storing, updating, versioning, and searching billions or trillions of vectors. Relying solely on filling servers with DRAM can quickly become prohibitively expensive. Using SSDs as an active part of searches offers an intermediary solution balancing capacity and speed.

Kioxia is also advancing its memory foundation supporting these products. Its BiCS FLASH strategy has two paths: a ninth-generation focused on manufacturing efficiency and performance via CBA technology, and a tenth that increases layer count to 332 for better density, performance, and energy efficiency. Alongside SanDisk, the company has detailed a 3D flash technology with a 4.8 Gb/s interface, improved bit density, and reduced power consumption during I/O operations.

The landscape of industrial storage is broadening. Data centers were historically designed around CPUs, memory, and storage with a clear hierarchy. AI is blurring these lines. HBM remains critical alongside GPUs but cannot alone absorb all data growth. DRAM continues to be vital but is costly and limited. NAND, thanks to its 3D structure and increasing density, has room for a more sophisticated role.

Kioxia aims to leverage this space with a portfolio addressing three key needs: massive capacity with LC9, extreme performance with XL-FLASH, and reduced dependence on DRAM via AiSAQ. While not all these pieces are equally mature or suitable for every workload, collectively they outline a clear direction: storage is transitioning from a passive component to an active part of AI system design.

For data center operators, the question will no longer be just how many petabytes they can install, but also where those data are placed, how they connect to GPUs, which data reside in HBM, DRAM, low-latency SSDs, or remain in large-scale storage. In AI “factories,” this architecture could make the difference between saturating accelerators or having expensive hardware waiting idly for data.

Frequently Asked Questions

What does Kioxia propose for AI storage?
Kioxia combines ultra-high-capacity SSDs, low-latency XL-FLASH memory, vector search software, and technologies aimed at bringing storage closer to GPUs.

What capacity is the Kioxia LC9 SSD?
The LC9 series reaches 245.76 TB in a 2.5-inch form factor with EDSFF E3.L, PCIe 5.0 interface, and BiCS FLASH QLC 3D memory.

What is AiSAQ and what does it do?
AiSAQ is Kioxia’s open-source software for vector searches in SSDs, designed to reduce the need to load the entire index into DRAM in RAG systems.

Why are SSDs important for AI?
Because AI workloads require storing, retrieving, and processing large amounts of data with low latency. Without storage systems that keep up, GPUs can be underutilized.

via: en.eeworld

X (Twitter) Facebook Pinterest LinkedIn E-mail