X (Twitter) Facebook Pinterest LinkedIn E-mail

For years, the idea that “SQL doesn’t scale” has served as a mental shortcut: as products grow, it’s assumed that you’ll need to “break out” of a relational database and adopt alternatives designed to distribute data and traffic from day one. The uncomfortable nuance is that this narrative falls apart once you look at how some of the world’s most demanding platforms operate.

The reality is less epic but much more useful: scaling with SQL is often an architecture, operations, and engineering discipline issue, not a matter of “intrinsic limitations” of the relational model. In other words: if SQL works for those experiencing extreme traffic spikes, it likely can also support — with high probability — the majority of products that won’t hit such conditions.

Shopify, e-commerce, and the MySQL backend

In the e-commerce world, there’s a time of the year that separates theory from practice: high-traffic campaigns, incidents, controlled degradations, and rapid decision-making. Shopify has publicly shared some of its technical approach around MySQL and infrastructure components that help absorb peaks, such as ProxySQL, which manages connections and distributes traffic between primary and read replica nodes. They even talk about operating thousands of ProxySQL instances and mechanisms to apply routing or mitigation rules safely during system pressure.

There’s no need to frame this as “SQL wins”: the key takeaway is the lesson. When traffic is intense, the strategy often involves moving reads to replicas, controlling access, protecting the primary, and automating changes, rather than abandoning the database altogether.

Meta and MyRocks: when I/O and cost are the real issues, not “SQL vs NoSQL”

Another classic example is Meta (Facebook), which has worked with MyRocks, a storage engine for MySQL optimized to save space and writes by leveraging LSM (Log-Structured Merge) tree techniques via RocksDB. The project has been documented and openly published, including technical references and related repositories.

The lesson here is almost surgical: beyond a certain scale, the debate shifts from “relational or not” to cost per terabyte, write amplification, p99 latency, compaction, and storage efficiency. SQL doesn’t disappear; the underlying system is adjusted.

YouTube and Vitess: sharding is a technique, not a religion

To scale horizontally, the most commonly repeated concept is sharding: splitting data based on some key (user, tenant, region, etc.) to distribute load. In the MySQL ecosystem, the most well-known case is Vitess, a project born at YouTube and now widely adopted, precisely to operate MySQL at large scale with sharding and orchestration layers.

Again, the point isn’t “use Vitess”. It’s understanding that SQL can scale horizontally if the topology (and the application) are designed for it.

PostgreSQL also plays in the big leagues: Instagram and “scale without changing the database”

The narrative of “SQL doesn’t scale” also falls apart when looking at PostgreSQL. Instagram explained years ago how Postgres remained their “solid foundation” as they grew, implementing practices like horizontal partitioning and optimization techniques (partial indexes, functional indexes, and other operational strategies).

This matters because it debunks another myth: that “Postgres is for medium-sized workloads” and “MySQL is for web”. In practice, both can handle huge loads if designed and operated properly.

More practical examples: Pinterest and Airbnb in the MySQL world

In real engineering, the debate often isn’t “which database do I choose,” but “how do I reduce risk today?” Pinterest has detailed its use of MySQL sharding and operational practices to manage scaling. Airbnb, for its part, has shared cases where the behavior of MySQL replicas and associated latency are as crucial as the schema itself, demonstrating that scalability is also about observability and fine control of the system.

And MariaDB: the “fork” that became an operational alternative

When MariaDB appears in the conversation, it’s often as “MySQL compatible”. In practice, for many organizations it’s a strategic operational and governance choice. A well-cited example is MariaDB in the Wikimedia ecosystem, which is frequently referenced as a large-scale deployment case.

Comparison table: common SQL options for scaling

Option	How it typically scales	Strengths	Typical trade-offs	Public examples (indicative)
MySQL (classic)	Primary + replicas, caching, application-level partitioning	Maturity, tooling, proven web performance	Writes focused on primary if unsharded	Shopify (MySQL + ProxySQL), Pinterest
MySQL + ProxySQL	Connection management, routing to replicas, incident mitigation	Reduces pressure on the database, speeds ops response	Adds a critical layer (proxy) and complexity	Shopify
MySQL + Vitess	Sharding assistance + orchestration	More systematic horizontal scaling	Changes operational model and sometimes application	Originated at YouTube; widely adopted
MySQL + MyRocks	Optimizes storage/writes while maintaining SQL compatibility	Cost and I/O efficiency	Trade-offs tied to LSM trees (compaction, tuning)	Meta (Facebook)
PostgreSQL	Replicas, partitioning, advanced query optimization	SQL power, extensions, robustness	Horizontal scaling requires design (as with any SQL)	Instagram
MariaDB	Similar to MySQL (replicas/clusters based on design)	Compatibility and deployment flexibility	Version/feature-dependent variations	Wikimedia (noted reference)
Distributed SQL (Spanner/Cockroach/Yugabyte)	Native distribution with strong consistency (per system)	Less “glue” needed to distribute data	Complexity and cost; highly case-dependent	(Varies by platform and provider)

The clear takeaway from this comparison is: SQL scales, but not “magically”. It scales when it’s designed to do so: separating reads and writes, choosing realistic partition keys, accepting the cost of distributed operations, and building automation and observability into the system.

Conclusion: the useful debate isn’t “SQL yes/no,” but “what architecture do I need?”

For most teams, the most cost-effective decision is pragmatic: maximize the capabilities of a well-designed SQL (MySQL/PostgreSQL/MariaDB) with replicas, caches, constraints, and observability, and only migrate to more complex architectures when data shows it’s truly necessary.

The lessons from Shopify, Meta, YouTube, and Instagram aren’t about a “winner” database. They show that big systems succeed through engineering: topology, proxies, sharding, optimization, and solid operational practices.

Frequently Asked Questions

When does “primary + replicas” stop being enough in MySQL or PostgreSQL?
When the real bottleneck is in writes or contention (locks, hot rows, or unsustainable indexes), and domain partitioning becomes inevitable.

Is Vitess only for “hyper-scale” setups?
Not necessarily, but it’s a step change: it’s worth it when sharding ceases to be exceptional and becomes normal operational practice.

What product types benefit most from MyRocks?
Workloads with many writes and storage pressure (cost per terabyte), where engine efficiency and I/O performance determine economic viability.

Is MariaDB just a “MySQL replacement”?
It can be in simple cases, but for large deployments, the decision often involves compatibility, stack governance, support, and platform strategy.

X (Twitter) Facebook Pinterest LinkedIn E-mail