Akamai Technologies (NASDAQ: AKAM) has made a significant breakthrough in artificial intelligence development by announcing the first global-scale deployment of NVIDIA AI Grid reference design. This initiative integrates NVIDIA’s AI infrastructure into Akamai’s worldwide network and employs intelligent workload orchestration systems to operate across its entire infrastructure. With this approach, the company aims to move beyond isolated AI factories and towards a distributed and unified network dedicated to AI inference.
This move represents an important step in the evolution of Akamai Inference Cloud, the platform the company introduced last year. As the first company to deploy a network based on the AI Grid concept, Akamai is rolling out thousands of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, creating a platform designed for businesses to run physical and agent-based AI systems with the speed of local computing and the scale of a global network.
“AI factories have been specifically designed for training and cutting-edge model workloads, and centralized infrastructure will continue to provide the best tokenomics for those use cases,” states Adam Karon, Chief Operating Officer and General Manager of Akamai’s Cloud Technology Group. “But real-time video, physical AI, and highly concurrent personalized experiences require inference at the point of contact—not a round trip to a centralized cluster. Our AI Grid intelligent orchestration offers AI factories a way to scale inference outward, leveraging the same distributed architecture that revolutionized content delivery to route AI workloads through 4,400 locations, at the right cost and the right time.”
The “Tokenomics” Architecture
At the core of AI Grid is an intelligent coordinator that acts as a real-time intermediary for AI requests. Applying Akamai’s expertise in application performance optimization to AI, this workload-sensitive control plane radically enhances tokenomics by improving costs per token, time to first token, and performance.
One of Akamai’s key differentiators is the ability for clients to access tuned or dispersed models via its vast global presence, offering significant advantages in cost and performance for the long tail of AI workloads. For example:
- Cost efficiency at scale: Companies can drastically reduce inference costs by automatically routing workloads to the appropriate compute level. The coordinator employs techniques like semantic caching and intelligent routing to direct requests to suitably sized resources, reserving premium GPU cycles for workloads that require them. All of this runs on Akamai Cloud, built on open-source infrastructure with generous outbound allowances to support large-scale data-intensive AI operations.
- Real-time responsiveness: Gaming studios can deliver AI-driven NPC interactions and real-time player engagement within milliseconds. Financial institutions can run personalized fraud detection and marketing recommendations instantly from login to the first screen. Broadcasters can transcode and dub content in real-time for global audiences. These outcomes are enabled by Akamai’s globally distributed edge network, with over 4,400 locations offering integrated caching, serverless edge computing via Akamai Functions and EdgeWorkers, and high-performance connectivity that processes requests at the user contact point, avoiding round-trip latency to origin-dependent clouds.
- Production-level AI at the Core: Large language models, continuous training, and multimodal inference workloads demand sustained, high-density computing that only dedicated infrastructure can deliver. Akamai’s clusters of thousands of GPUs, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, provide concentrated processing power for the heaviest AI workloads, complementing the distributed edge with centralized scale.
The Computing Continuum: From Core to Far-Edge
Based on NVIDIA AI Enterprise and utilizing NVIDIA Blackwell architecture along with NVIDIA BlueField DPU for hardware-accelerated networking and security, Akamai is able to manage complex Service Level Agreements (SLAs) in both edge and core locations:
- The Edge (over 4,400 locations): offers fast response times for physical AI and autonomous agents. It leverages semantic caching and serverless capabilities like Akamai Functions (WebAssembly-based computing) and EdgeWorkers to provide model affinity and stable performance at the contact point with the user.
- Akamai Cloud IaaS and dedicated GPU clusters: The central public cloud infrastructure enables portability and cost savings for large-scale workloads, while pods equipped with NVIDIA RTX PRO 6000 Blackwell GPUs facilitate high-resilience training and multimodal inference.
“Emerging native AI applications demand predictable latency and scalable profitability worldwide,” says Chris Penrose, Vice President of Global Business Development and Telco at NVIDIA. “By launching NVIDIA AI Grid, Akamai is building the connective tissue for generative, agent-based, and physical AI, moving intelligence directly to the data to power the next wave of real-time applications.”
Driving the Next Wave of Real-Time AI
Akamai is already witnessing strong early adoption of Akamai Inference Cloud in sectors that require heavy computational resources and are latency-sensitive:
- Gaming: Studios are implementing inferencing under 50 milliseconds for AI-driven NPC interactions and real-time player engagement.
- Financial services: Banks rely on the network to deliver hyper-personalized marketing and quick recommendations during critical login moments.
- Media and video: Broadcasters utilize the distributed network for AI-powered transcoding and real-time dubbing.
- Retail: Retailers are adopting the network for AI applications in stores and associated point-of-sale productivity tools.
Driven by enterprise demand, the platform has also been validated by leading technology providers, including a four-year, $200 million service agreement for a cluster of thousands of GPUs in a dedicated data center built specifically for enterprise AI infrastructure in the metropolitan edge.
From Centralized to Distributed AI Factories
The first wave of AI infrastructure was characterized by massive GPU clusters at a few centralized locations, optimized for training. However, as inference becomes the dominant workload and companies across sectors focus on creating AI agents, that centralized model faces similar scalability limitations as previous internet infrastructure generations faced in content distribution, online gaming, financial transactions, and complex microservices architectures.
Akamai is addressing each of these challenges with the same fundamental approach: distributed networks, intelligent orchestration, and systems specifically designed to bring content and context as close as possible to the digital contact point. The result has been improved user experience and higher ROI for companies adopting this model. Akamai Inference Cloud applies this proven architecture to AI factories, enabling the next wave of scaling and growth through the distribution of dense computing from core to edge. For companies, this means deploying context-aware, responsive AI agents. For the industry, it illustrates an evolution from isolated facilities to a globally distributed, utility-style AI factory model.

