Akamai introduced Inference Cloud, a platform designed to bring AI inference from centralized data centers to the edge of the Internet, aiming to deliver low latency, security, and global scalability for agent-based AI and Physical AI (robots, vehicles, smart cities) applications. The initiative leverages NVIDIA Blackwell infrastructure and Akamai’s global distributed network to bring computing closer to users and devices.
The strategic thesis is clear: the next wave of applications — acting agents, personalized experiences, and real-time decision systems — need to infer “close” to the user. With Inference Cloud, Akamai moves AI decision-making to thousands of locations and intelligently orchestrates where each task should run to maintain virtually instant responses.
What’s under the hood: Blackwell at the edge and a global network
Inference Cloud combines NVIDIA RTX PRO servers (featuring RTX PRO 6000 Blackwell Server Edition), NVIDIA BlueField-3 DPUs, and NVIDIA AI Enterprise software on top of distributed cloud infrastructure and Akamai’s edge network, with more than 4,200 locations worldwide. The roadmap includes BlueField-4 to further accelerate and secure data access and inference workloads from the core to the edge. Deployment begins at 20 initial locations, with ongoing phased expansion.
Use cases: from real-time agents to “Physical AI”
- Agent-based AI and personalized experiences: extending “AI factories” to the edge for smart commerce and assistants capable of bargaining, purchasing, and optimizing in real-time based on location, behavior, and user intent.
 - Streaming inference for financial decisions: multiple sequential inferences typical of agents, resolved with millisecond latencies, useful for fraud detection, secure payments, and automation.
 - Physical AI: supporting industrial robots, drones, urban infrastructure, and autonomous vehicles, where timing accuracy and security require sensor processing and decision-making at the speed of the physical world.
 
Core–edge orchestration: where to run each task
The control plane dynamically directs each request to the most efficient point:
- Routine and highly latency-sensitive tasks are resolved at the edge — including NVIDIA NIM microservices.
 - More complex or reasoning tasks are sent to centralized AI factories.
 
All of this is managed from a unified platform that abstracts the complexity of operating distributed AI workloads on a planetary scale.
Security and compliance: distributed architecture with centralized controls
Inference Cloud inherits Akamai’s security model and global operations, adding BlueField DPUs as a layer of isolation, encryption, and offloading of critical tasks. The combination of edge + Blackwell + AI Enterprise targets regulated sectors that demand traceability, governance, and low latency without compromises.
What this means for the market
- Less latency, closer proximity: for agents and interactive applications, placing inference as close as possible to the user is critical; a footprint with >4,200 points provides a competitive edge in proximity.
 - Global scaling: starting at 20 sites shortens time-to-market and allows incremental growth based on actual demand.
 - Model portability: relying on NVIDIA AI Enterprise and NIM facilitates migrating workloads between edge and core without rewriting the application.
 
Frequently Asked Questions
What exactly is Akamai Inference Cloud?
A distributed inference platform that runs AI at the edge and in the core, with NVIDIA Blackwell hardware, BlueField DPUs, and AI Enterprise software, designed for extremely low latency and global deployments.
How many locations does Akamai’s edge have, and where does the service start?
The edge network exceeds 4,200 locations; initial availability begins at 20 sites, with an expansion plan underway.
Which workloads benefit most?
Agents with multiple inferences per task, financial services (fraud, payments), e-commerce (personalized experiences), and Physical AI (robots, vehicles, smart cities) that require millisecond decision-making.
How does the platform decide where to run tasks (edge vs. core)?
Using a orchestration layer that dynamically routes: latency-sensitive tasks are handled at the edge, while complex workflows go to centralized AI factories. Everything is managed from a unified console.
Sources: Official press release and materials from Akamai regarding Akamai Inference Cloud (10/28/2025) and NVIDIA-related documentation on Blackwell, NIM, and BlueField.

