For years, SSD manufacturer specifications have conveyed the idea that all models with the same capacity and technology were practically equivalent. However, a new study from Technical University of Munich and Copenhagen University shows that this perception is misleading: two SSDs almost identical on paper can behave very differently under real workloads.
Published in Proceedings of the VLDB Endowment (PVLDB) under the title “SSD-iq: Uncovering the Hidden Side of SSD Performance”, the paper introduces a new benchmark—SSD-iq—designed to reveal those “dark data” that traditional tests do not show.
The big question: Does it really matter which SSD you choose?
The research starts with a seemingly simple question: does it matter which SSD model is used in a database system? So far, architects and administrators have relied on standard metrics: sequential reads and writes, IOPS (random operations per second), and sometimes nominal latency. According to the authors, these metrics fail to capture key internal phenomena, such as write amplification (WAF) or latency under load.
Experiments on nine SSDs from leading manufacturers (Samsung, SK Hynix, Intel, Micron, Western Digital, Kioxia, plus AWS cloud units) revealed differences of up to 2.5 times in WAF, and latency that shot from microseconds to milliseconds. All models, on paper, appeared nearly identical.
The Achilles’ heel: write amplification (WAF)
One of the most significant findings was the impact of WAF. Unlike traditional HDDs that can write directly to a location, SSDs need to erase entire blocks and reposition valid pages, causing additional, invisible writes.
- Alibaba previously reported WAFs reaching 8 in cloud workloads, and NetApp reported over 10 in enterprise settings.
- In Haas and team’s experiments, measured WAF ranged from 1.9 to over 6 depending on the model.
- This means a single SSD can internally perform six times more writes than the operating system perceives, reducing its lifespan and increasing replacement costs.
Unexpectedly, skewed access patterns (Zipf or Two-Zone), common in databases, did not improve WAF; in many models, it worsened, indicating that the garbage collection algorithms used are quite basic and cannot adapt to non-uniform access patterns. Only some Intel and Western Digital models showed signs of smarter algorithms.
Latency under load: the hidden side of milliseconds
Another critical point is latency. Specifications say reads take about 75-80 μs, and writes about 15 μs. Yet under sustained load:
- Some models maintained stable latencies around 20 μs.
- Others jumped to over 10 ms at high percentiles (p99.9).
For OLTP applications that depend on instant transaction acknowledgments (WAL logging), this jump is disastrous: an operation that should be immediate may become a system bottleneck.
SSD-iq benchmark: measuring what matters
Faced with this opacity, researchers developed SSD-iq, an open-source set of tests available on GitHub (https://github.com/gabriel-haas/ssdiq). Unlike consumer-focused benchmarks, SSD-iq introduces critical metrics relevant for database and data center environments:
- Real WAF: measured via OCP NVMe interface or performance estimation.
- Latency under load: including high percentiles (p99.9).
- Performance with skewed access patterns: Zipf and Two-Zone models that mimic real workloads.
- Over-provisioning (OP): hidden space reserved by the manufacturer for garbage collection management.
These parameters provide a more accurate view of a real SSD’s behavior and facilitate meaningful comparisons between models in mission-critical setups.
Practical cases: Samsung vs Micron
To illustrate the importance, the team compared two nearly identical models in specs and price: Samsung PM9A3 and Micron 7450 PRO, both with 960GB.
- Samsung showed more stable latency under load, reaching 20,000 TPS in TPC-C.
- Micron had lower WAF (longer-lasting), but worse OLTP performance, dropping to 15,000 TPS in steady state.
The choice depends on whether immediate performance or hardware longevity is prioritized.
Implications: sustainability and future
The authors also highlight the sustainability impact. Replacing SSDs prematurely generates electronic waste and energy costs. Reducing WAF with smarter algorithms extends SSD lifespan and lowers the environmental footprint of data centers.
Futures technologies like ZNS (Zoned Namespace) and FDP (Flexible Data Placement) will enable host-managed data organization, opening pathways to mitigate many of these issues. SSD-iq could become the benchmark reference for evaluating future drives.
Conclusion
Haas’s study debunks the myth that SSDs are interchangeable. Technical specs alone are insufficient, and traditional benchmarks obscure critical differences. With SSD-iq, for the first time, we can measure what really matters: load latency, write amplification, and sustainability.
If industry players and reviewers adopt this benchmark, it could revolutionize how we evaluate and select storage solutions in the age of real-time transactions and AI.
FAQs
What is write amplification (WAF) in an SSD?
WAF measures how many internal writes an SSD performs per logical write. For example, a WAF of 4 means that for each GB written by the system, the SSD performs 4 GB of internal writes—increasing wear and reducing lifespan.
Why don’t traditional benchmarks show these differences?
Because they focus on simple tests—sequential or random read/write—that do not reflect real-world workloads. They do not measure internal phenomena like WAF or extreme latency under load.
What advantages does SSD-iq offer over other tests?
It introduces realistic metrics such as WAF, OP, high-percentile latency, and skewed access patterns, providing a better understanding of SSD performance in enterprise and cloud environments.
Where can I download SSD-iq?
SSD-iq is open source on GitHub: https://github.com/gabriel-haas/ssdiq, including scripts and data to reproduce tests.
More detailed information is available in the SSD-iq report: https://www.vldb.org/pvldb/vol18/p4295-haas.pdf.