Marvell Wants to Squeeze CXL Memory with Silicon Compression

Memory has become one of the most expensive and strained components of artificial intelligence infrastructure. It’s not just about missing GPUs. Gigabytes near the processor are also scarce, including server DDR5 modules, hot data memory for databases, capacity for large model inference, and systems capable of supporting vector searches without skyrocketing node costs. In this context, Marvell presents a simple-to-understand but challenging-to-execute idea: compress memory directly within the CXL controller.

The proposal relies on Structera X and Structera A, its family of CXL devices for memory expansion and data proximity acceleration. The company argues that the current bottleneck isn’t solved merely by adding more DRAM, because DRAM is expensive, limited, and competes with AI data center demands. The differentiator is a dedicated hardware block, called Compression-Decompression Block, which compresses data when written to memory and decompresses it when read, without processor intervention or visible changes to the operating system.

The economic promise is straightforward: if a workload achieves a 2:1 compression ratio, each physical gigabyte can behave as two useful gigabytes. If the ratio is higher, potential savings increase. At a time when server memory prices influence machine purchasing decisions, this difference can be as significant as the choice of CPU or GPU.

CXL not only extends memory but can also alter its actual cost

Compute Express Link, better known as CXL, enables coherent, memory-semantic connections between memory and accelerators on PCIe. Practically, it opens the door to expanding beyond traditional server DDR banks, creating memory tiers, and, in more advanced versions, moving toward shared or disaggregated memory models.

Until now, much of the CXL conversation has focused on adding capacity to servers that can no longer grow within traditional CPU channels. Marvell aims to move the debate further: increasing physical capacity isn’t enough; we must also increase the effective usable capacity of memory, especially when many data in memory are compressible.

ElementWhat it provides
CXLExpands memory outside traditional DDR channels
Structera XControllers to increase server memory capacity
Structera AData proximity accelerators for intensive workloads
CDBSilicon compression and decompression
LZ4Lossless, fast, low-latency algorithm
One-to-many mappingProvides the host with more virtual memory than physical DRAM

The difference compared to software compression lies in where the cost is paid. When a database, analytical engine, or application compresses data using the CPU, it gains capacity but consumes compute cycles, adds complexity, and requires software support. Marvell shifts this work into dedicated silicon within the CXL device, aiming to keep it aligned with memory bandwidth and out of the CPU’s way.

The CDB block, the key piece of the announcement

The Compression-Decompression Block isn’t just a library or a firmware option added later. Marvell presents it as a dedicated hardware block embedded in its Structera CXL devices. Its role is transparent: the host writes data, the controller compresses it before storing it in DRAM; when read, the controller decompresses it and delivers the data as if reading normal memory.

The company uses a proprietary variant of LZ4, a well-known lossless algorithm praised for its speed. LZ4 is used in databases, storage systems, analytical engines, and software where decompression latency matters. The choice doesn’t aim for maximum compression at any cost but balances ratio, latency, and bandwidth.

Reported metricValue
AlgorithmDerived from LZ4
Page sizes4 KB and 1 KB
Maximum ratio64:1 in all-zero pages
Effort levels0 to 3, configurable
Compression typeLossless
Host visibilityTransparent to CPU and OS

The max ratio of 64:1 sounds impressive, but it should be interpreted carefully: it applies mainly to extreme cases like all-zero pages. In real workloads, the effective ratio depends on data types. Structured text, source code, web content, compiled binaries, databases, or natural language won’t all compress equally.

How much useful memory can be gained

Marvell reports ratios for various data types, comparing them to LZ4 running on the host. Their measurements show Structera CDB either matching or closely approaching the compression quality of software LZ4, without consuming CPU cycles.

Data typeStructera CDB ratioLZ4 on host ratio
XML2.75x2.64x
Database (nci)3.64x3.65x
Source code (samba)2.00x2.07x
Web content (webster)1.67x1.65x
Natural language (dickens)1.32x1.32x
Compiled binary (mozilla)1.68x1.73x

The most interesting result isn’t the highest ratio but the consistency. For XML and databases, the savings can be very high; for natural language, more modest gains. For binaries, it depends on the content type. For real infrastructure, this means measurement before purchase: not all workloads will turn 12 TB of physical memory into 24 GB or even 48 GB of useful capacity.

Even ratios of 1.5x or 2x can change budgets. In large memory pools, improving usable capacity without adding modules reduces costs, power consumption, space, and supply chain pressure. For in-memory databases, recommenders, LLM inference, caches, vector search engines, and analytics, memory isn’t just an accessory—it’s a core cost per query or user.

Why it matters now: DDR5 has become a critical factor

The announcement arrives when server memory is no longer a cheap commodity. Marvell cites spot prices of $27 to $37 per GB for DDR5 RDIMMs, which could put a 12 TB pool at around half a million dollars just in DRAM. It also reports increases of 300% to 400% since mid-2025.

While these figures are market references provided by the company, they align with broader trends. Memory manufacturers are prioritizing HBM, server DRAM, contracts with hyperscalers, and AI-related products. Generic server memory now faces a much more aggressive demand than a few years ago.

IssueImpact on infrastructure
Expensive DDR5Rising server and CXL pool costs
AI demandAbsorbing manufacturing capacity
Prioritization of HBMShifting investments toward higher-margin products
More inference modelsGrowing memory needs per node
In-memory databasesIncreasing pressure on effective capacity
Vector searchRequires large volumes near compute

Hardware compression doesn’t create new DRAM but makes existing memory work harder. That distinction is crucial. It doesn’t solve all supply issues but can delay purchases, limit maximum configurations, and enable designs that would be too costly with uncompressed memory.

Not all workloads are equal

The main risk of this technology is overpromising universal gains. It’s not. Compression depends on data. Workloads with highly repetitive pages, regular structures, or many zeros can benefit greatly. Data that is already compressed, encrypted, or has high entropy will see little to no capacity gain. Additionally, CXL introduces its own latency compared to local DDR memory, and compression adds another factor to consider.

This doesn’t invalidate the idea—only means it must be applied correctly. Compressed CXL memory can be highly attractive for cold or warm data, large files with tolerable access patterns, secondary caches, databases with partial capacity requirements, or workloads where cost per GB outweighs a few nanoseconds of latency.

Suitable candidatesChallenging workloads
Compressible database dataAlready compressed data
Large cachesUltra-light workloads
Vector search with expensive memoryEncrypted data in memory
Recommenders with large tablesVery sensitive random accesses
In-memory analyticsPeaks where CXL becomes a bottleneck
Tiered memoryApplications intolerant to variation

Real adoption will depend on independent testing, OS integration, observability tools, memory allocation policies, and CXL maturity on each CPU platform. Hardware can make compression transparent, but architects will still need to know which parts of their memory reside in local DDR and which are in a compressed CXL layer.

A glimpse into the future of memory in AI

The broader takeaway is that the industry is starting to treat memory as an active layer, not just a passive byte reservoir. For years, DRAM was added to servers with the assumption that software would utilize it effectively. With AI, vector databases, and large inference engines, this paradigm complicates. Now, decisions must be made about which data deserves HBM, which should stay in local DDR, which can move to CXL, and which can be compressed without significant penalty.

Marvell aims to be at this frontier. Structera does not compete with CPUs or GPUs but with the cost of filling each machine with DIMMs. In big data centers, saving modules can be as valuable as boosting raw performance. Less physical DRAM can also mean lower power consumption and reduced supply chain stress.

This approach won’t be exclusive to Marvell forever. As memory prices remain high, other CXL controllers, accelerators, and server architectures will seek similar mechanisms. Compression, deduplication, automatic tiering, and shared memory will become common tools for maximizing capacity.

The useful gigabyte as the new metric

For a long time, memory was bought by physical capacity: 512 GB, 1 TB, 3 TB, 12 TB. In the new era, that figure won’t be enough. What will matter is useful capacity per dollar, per watt, per slot, and per workload. This is where silicon-based compression can change the conversation.

Marvell isn’t claiming that all data will quadruple in capacity or that compressed CXL will replace local DRAM. Its more concrete message: in a market where each gigabyte costs significantly more, it makes little sense to store compressible data as if memory remained cheap.

If CXL becomes the natural extension of server memory, inline compression could shift from a differentiator to a necessity. In AI, where scale is measured both by compute and available memory, every gigabyte counts. The new twist is that Marvell wants each gigabyte to count more than once.

Frequently Asked Questions

What is Marvell Structera CXL?
A family of CXL devices designed to expand memory capacity and bring acceleration closer to data in data center servers.

What does the CDB compression do?
Compresses data in hardware when written to DRAM and decompresses on read, transparently for the CPU, OS, and applications.

Does this mean always doubling memory?
Not necessarily. The ratio depends on data. Marvell reports ratios from about 1.32x to 3.64x across different data types, with much higher maxima only in extreme cases like all-zero pages.

Which workloads benefit most?
In-memory databases, recommenders, LLM inference, vector searches, large caches, and workloads prioritizing capacity over minimal latency.

via: marvell

Scroll to Top