Akamai, the cybersecurity and cloud computing company that powers and protects online businesses, has introduced Akamai Cloud Inference, marking the beginning of a faster and more efficient era of innovation for organizations looking to turn predictive and large language models (LLMs) into reality. Akamai Cloud Inference runs on Akamai Cloud, the world’s most distributed network, to address the growing limitations of centralized cloud models.
“Bringing AI data closer to users and devices is challenging, and that’s an area where legacy clouds struggle,” says Francisco Arnau, Vice President of Akamai for Spain and Portugal.
“While the heavy lifting of training LLMs will continue to be done in hyper-scale data centers, the actionable work of inference will take place at the edge, where the network that Akamai has built over the past two and a half decades becomes vital for the future of AI and differentiates us from any other cloud provider in the market.”
AI Inference on Akamai Cloud
Akamai’s new solution provides tools for engineers and developers to build and run AI applications and workloads with large volumes of data closer to end users, delivering 3 times better performance and reducing latency by up to 2.5 times. With this new solution, companies can save up to 86% on AI inference and agent AI workloads compared to traditional hyper-scale infrastructure. Akamai Cloud Inference includes:
- Computing: Akamai Cloud offers a broad and versatile computing set, from classic CPUs for optimized inference to powerful accelerated computing options with GPUs and custom ASIC VPUs to provide the right power for a spectrum of AI inference challenges. Akamai integrates with Nvidia’s enterprise AI ecosystem, leveraging Triton, Tao Toolkit, TensorRT, and NvFlare to optimize AI inference performance on Nvidia GPUs.
- Data Management: Akamai enables clients to unlock the full potential of AI inference with a cutting-edge data architecture designed specifically for modern AI workloads. Akamai has partnered with VAST Data to provide optimized real-time data access to accelerate inference-related tasks that are essential for delivering relevant outcomes and a responsive experience. This is complemented by highly scalable object storage to manage the volume and variety of critical datasets for AI applications, along with integration with leading vector database providers like Aiven and Milvus to facilitate retrieval-augmented generation (RAG). With this data management platform, Akamai securely stores tuned model data and training elements to deliver low-latency AI inference at global scale.
- Containerization: Containerizing AI workloads enables auto-scaling based on demand, improves application resilience, and allows hybrid-multi-cloud portability while optimizing performance and cost. With Kubernetes, Akamai provides faster, more cost-effective, and secure AI inference with petabyte-scale performance. Backed by Linode Kubernetes Engine – Enterprise, a new enterprise edition of Akamai Cloud’s Kubernetes orchestration platform designed specifically for large-scale enterprise workloads, and the recently announced Akamai App Platform, Akamai Cloud Inference can quickly deploy an open-source Kubernetes platform prepared for AI with open-source Kubernetes projects, including Kserve, KubeFlow, and SpinKube, all seamlessly integrated to streamline AI model deployment for inference.
- Edge Compute: To simplify the method of building AI-based applications for developers, Akamai AI Inference includes WebAssembly (WASM) capabilities. By collaborating with WASM providers like Fermyon, Akamai allows developers to run inferences for LLMs directly from serverless applications, enabling clients to execute lightweight code at the edge to power latency-sensitive applications.
Together, these tools create a powerful platform for low-latency, AI-driven applications that enable companies to deliver the experiences their users demand. Akamai Cloud Inference operates on the company’s massively distributed network, capable of consistently delivering over one petabyte per second of performance for data-intensive workloads. Comprising more than 4,100 points of presence across over 1,200 networks in more than 130 countries worldwide, Akamai Cloud makes computing resources available from the cloud to the edge, while accelerating application performance and enhancing scalability.
The Shift from Training to Inference
As AI adoption matures, companies are recognizing that the hype surrounding LLMs has created a distraction, diverting attention from practical AI solutions that are better suited to solve specific business problems. LLMs excel at general-purpose tasks like summarization, translation, and customer support. These are very large models that are expensive and require significant training time.
Many companies have been constrained by architectural requirements and costs, including data centers and compute power, well-structured, secure, and scalable data systems, and the challenges that location and security requirements impose on decision latency. Lightweight AI models, designed to address specific business problems, can be optimized for specific sectors, leveraging proprietary data to create measurable outcomes and represent a better return on investment for today’s businesses.
AI Inference Needs a More Distributed Cloud
Increasingly, data will be generated outside centralized data centers or cloud regions. This shift is driving demand for AI solutions that enable data generation closer to the point of origin. This represents a fundamental reconfiguration of infrastructure needs as companies move beyond the creation and training of LLMs, towards leveraging data to make faster, smarter decisions and invest in more personalized experiences. Businesses recognize that they can generate more value by leveraging AI to manage and enhance their operations and business processes.
Distributed architectures at the cloud and edge are emerging as the preferred solutions for operational intelligence use cases as they can provide real-time actionable insights about distributed assets, even in remote environments. Early examples of customers in the Akamai Cloud include voice assistance in cars, AI-driven crop management, image optimization for consumer goods markets, shopping experiences with virtual garment visualization, automated product description generators, and customer feedback analyzers.
“Creating an LLM is like making a map: it requires collecting data, analyzing the terrain, and plotting routes. It is slow and resource-intensive, but once built, it is very useful. AI inference is like using a GPS: it instantly applies that knowledge, recalculates in real-time, and adapts to changes to get you where you need to go,” adds Arnau. “Inference is the next frontier of AI.”