Akamai Wants to Bring AI Inference to the Edge with 4,400 Locations

Akamai has taken a significant step in its Artificial Intelligence strategy by introducing AI Grid Intelligent Orchestration, a new layer of orchestration for distributed inference that, according to the company, makes its network the first to implement at a global scale the NVIDIA AI Grid reference design. The proposal leverages Akamai Inference Cloud infrastructure and the deployment of thousands of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs to bring inference closer to the user, rather than concentrating it solely in large central clusters.

This announcement matters because it reflects a fundamental market shift. In recent years, the conversation about AI has mostly revolved around large centralized “AI factories,” optimized for training and frontier models. Akamai does not dispute that role but argues that many real inference workloads—especially those related to real-time video, AI agents, personalization, or physical AI—require something else: low latency, proximity to data, and a network capable of deciding where to execute each request for the best balance of cost and performance.

From centralized data centers to distributed edge

Akamai’s thesis is that inference can no longer rely solely on a round-trip journey to a large remote cluster. Its new architecture distributes the workload across edge, intermediate regions, and central nodes, with an orchestrator acting as a real-time intermediary to decide where each request should be executed. The company explains that this control plane is designed to optimize what it calls “tokenomics,” i.e., cost per token, time to first token, and overall throughput.

In practice, this means applying techniques like semantic caching, model affinity, and intelligent routing to reserve the most expensive GPUs for workloads that truly need them, while redirecting other requests to less costly resources. On the product’s official page, Akamai also states that its platform combines inference, networking, and security into a single distributed layer, with specific controls for models, agents, and APIs exposed at the edge.

The company bases this approach on a footprint of over 4,400 edge locations, one of the most notable figures in the announcement. Akamai claims that this reach enables processing requests at the digital touchpoint with the user, avoiding the added latency of a traditional cloud relying solely on origin. Meanwhile, NVIDIA positioned this initiative within its vision of AI Grid, a reference architecture designed to deploy and orchestrate AI across multiple distributed sites.

Blackwell, security, and workloads optimized for agents

The technical core of the service is built around NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Akamai explains that its Inference Cloud is designed for edge AI inference and uses these GPUs alongside BlueField DPUs to improve time to first token and tokens per second. The company even claims that, in tests, Blackwell delivers up to 1.63 times higher inference throughput than H100 within its own cloud, though this figure should be read as an internal benchmark.

The commercial message aligns with the dominant workloads predicted for 2026. Akamai openly discusses agentic AI, physical AI, and hyper-personalized experiences as demand drivers. Its product notes mention use cases like AI-powered NPCs for video games, real-time recommendation engines, fraud detection, automation, RAG, multi-agent systems, and productivity tools for retail or customer service. Parallel to this, NVIDIA has positioned low-latency, distributed AI as a key goal of its new AI Grid Reference Design.

Another important point is that Akamai aims to differentiate itself from mere “GPU hosting.” Its documentation emphasizes that it does not just sell access to accelerators but offers an edge inference platform with intelligent routing, model abuse protection, identity controls, segmentation, and security tailored for AI. This security layer could be crucial at a time when many organizations are increasingly concerned not just about the cost of serving models but also about the risks of exposing them to prompt injection, scraping, API abuse, or lateral movement.

A move with more ambitions than it seems

Akamai had already hinted at this offensive in late 2025 when it launched Akamai Inference Cloud and tied it to the growth of off-center inferential AI. Now, it aims to scale that effort further, supported by a financial commitment that underscores the seriousness of this move: on March 5, Akamai announced a four-year service contract worth $200 million with a major U.S. tech company for a cluster of thousands of Blackwell GPUs housed in a data center designed for AI infrastructure at the edge metro.

This deal does not automatically make Akamai a direct competitor to hyperscalers but shows that it wants to play a different game. Instead of competing solely on centralized training, it seeks to strengthen its position in the layer of distributed inference, where proximity, network, and orchestration are nearly as important as the GPU itself. This is arguably the most interesting takeaway from the announcement: Akamai is not claiming that the future of AI will abandon the “AI factories,” but rather that these factories will need to extend outward—toward the edge—to properly serve the next wave of real-time applications.

Of course, the most challenging part remains: demonstrating that this distributed mesh can sustain SLAs, costs, and performance at scale outside of product demos. Nevertheless, the approach aligns with industrial logic. If the first phase of AI centered on massive clusters for training models, the next phase will hinge on how and where those models are served. Here, Akamai believes its longstanding advantage in global distribution can evolve into a new edge inference advantage.

Frequently Asked Questions

What exactly has Akamai announced?

Akamai has unveiled AI Grid Intelligent Orchestration, a layer of orchestration for distributed inference within Akamai Inference Cloud, supported by over 4,400 edge locations and thousands of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs.

What does it mean that this is the first global implementation of NVIDIA AI Grid?

It means that Akamai claims to have operationalized at a real scale the NVIDIA AI Grid reference design to deploy and orchestrate AI workloads across multiple distributed sites, rather than concentrating them in just a few central clusters.

What types of applications is this model intended for?

Akamai envisions use cases such as AI-driven NPCs in video games, fraud detection, real-time recommendations, video dubbing and transcoding, retail, virtual assistants, and AI agents requiring low latency and immediate responses near the user.

Is Akamai Inference Cloud available now?

Yes. Akamai indicates that Inference Cloud is currently available for qualified enterprise customers, with access by request.

via: akamai

Scroll to Top