X (Twitter) Facebook Pinterest LinkedIn E-mail

A quiet adjustment in the Amazon Web Services (AWS) catalog has once again highlighted an uncomfortable reality for many companies: AI infrastructure is not only scarce but also increasingly expensive to plan. Several specialized media outlets have reported that AWS has increased roughly 15% the price of its EC2 Capacity Blocks for ML (reserved capacity blocks for machine learning workloads) on cutting-edge NVIDIA H200-based instances.

According to these reports, the p5e.48xlarge—configured with 8 NVIDIA H200 GPUs—has gone from $34.61 to $39.80 per hour in most regions, while the p5en.48xlarge is rising from $36.18 to $41.61 per hour. If this change sustains, the increase is significant: in intensive training or inference projects, a 15% hike can impact a full quarter’s budget, especially for teams already working with narrow margins or capacity commitments tied to milestones.

What exactly is getting more expensive: reserving GPUs, not just “using GPUs”

The key lies in the affected product: Capacity Blocks for ML are not the classic on-demand offerings but a mechanism to reserve GPU capacity in advance with a planning window, designed to prevent “out of stock” scenarios during critical moments (long training runs, inference peaks, product launches, or large-scale testing). AWS presents these blocks as a way to pre-book GPU instances with durations ranging from short-term to longer commitments, providing certainty in volatile environments where demand fluctuates and capacity shortages are a real risk.

At the same time, AWS clarifies that prices for these blocks can be updated: their public pricing documentation states that rates are adjustable (shown as a combination of reservation fee and operating system fee), an important nuance because it normalizes the possibility of changes without a formal “press release.”

Why this hurts more in 2026: AI’s “premium tier” is no longer optional

More than the percentage increase, what matters is the type of instance. The P5e and P5en families have become a benchmark for high-end AI workloads. AWS positions them as infrastructure for training and deploying large language models (LLMs) and generative models, with configurations supporting up to 8 H200 GPUs per instance and a focus on performance, networking, and scalability within clusters (UltraClusters). This is no longer “sandbox experimentation”: it now forms the backbone of many commercial products.

Furthermore, AWS distinguishes between P5e and P5en in critical aspects of distributed performance. According to their product description, P5en is associated with platform enhancements (CPU, connectivity, latency) aimed at optimizing distributed training and communication scenarios. In other words: you don’t pay only for the GPU, but for the entire ecosystem that prevents bottlenecks when training large models in parallel.

The uncomfortable part: prices can shift when you need them most

The most irritating detail for the community—and the reason behind the tone of “hope they didn’t notice”—is not that pricing power exists, but how the changes are perceived: weekend adjustments, third-party detections of variations, and operational opacity. In highly consolidated markets, clients fear scenarios where providers hike prices just when projects are “locked-in”: trained models, established pipelines, dependencies on managed services, data ecosystems, and looming deadlines.

This reveals a growing pattern in the industry: AI is creating a “capacity toll”. Not just for consumption but for guaranteed availability. And when competitors are vying for the same energy, space, and supply chain, hyperscalers tend to transfer cost pressures to products where demand is more inelastic.

What companies should monitor from now on

Separate cost per hour from cost per result: in AI, the crucial metric is the cost per completed training, per million tokens inferred, or per meaningful experiment. A 15% increase might be manageable if it reduces the risk of capacity shortages but could be devastating if the project is already over-provisioned.
Review the “reserve vs. elasticity” strategy: Capacity Blocks provide peace of mind but also introduce dependency on the provider’s pricing at reservation time. In high-uncertainty scenarios, it might make sense to combine minimal reservations with elastic options or multi-provider strategies.
Continuously audit pricing: if costs can change, governance must evolve too. FinOps becomes more than just a dashboard—it’s a process: alerts, dynamic budgets, limits, and alternative scenarios.
Compare with other options: from equivalent instances across different hyperscalers to bare metal solutions or capacity agreements with regional providers. It may not always be cheaper, but it can be more predictable.

Frequently Asked Questions

What are EC2 Capacity Blocks for ML, and how do they differ from on-demand?
They are capacity blocks that allow pre-reserving GPU instances for machine learning workloads. Unlike on-demand, they aim to reduce the risk of capacity shortages when needed.

What GPUs are in the p5e.48xlarge and p5en.48xlarge instances?
AWS states that P5e and P5en are based on NVIDIA H200 and these configurations can reach up to 8 H200 GPUs per instance.

Does increasing Capacity Blocks prices mean the prices of other GPU instances will also go up?
Not necessarily. Capacity Blocks are a specific reservation product. However, in a tight market, price adjustments might eventually be reflected across other layers depending on demand trends.

How can I mitigate the impact of price hikes on AI projects?
Through FinOps practices (alerts, budgets, result-based metrics), efficiency optimizations (batching, quantization, improved pipelines), and resilience strategies (multi-region, multi-provider, or alternative contracts).

X (Twitter) Facebook Pinterest LinkedIn E-mail