Huawei Cloud introduces Agentic Infra to bring enterprise-scale intelligent agency

Huawei Cloud has announced in Shanghai a new generation of agentic artificial intelligence products with a very clear underlying idea: companies not only need more powerful models but also an infrastructure capable of running agents safely, continuously, and efficiently. The announcement was made during Huawei Cloud INSPIRE 2026, held at the West Bund International Convention & Exhibition Center, where the company introduced the concept of Agentic Infra and a series of services focused on training, inference, memory, security, governance, and industrial deployment of agents.

This proposal comes at a time when the market is moving beyond isolated testing phases with generative models. The next frontier is production: agents that reason, query data, execute tasks, maintain context over longer periods, and operate in sectors like healthcare, manufacturing, energy, robotics, scientific research, or administration. For Huawei Cloud, this leap requires a different architecture from traditional cloud, with tighter coordination among hardware, software, storage, networks, security, and development platforms.

Agentic Infra: a token factory for general workloads and AI

The centerpiece of the announcement is Agentic Infra, a new unified infrastructure for general workloads and AI workloads. Huawei Cloud defines it based on four ideas: an efficient “token factory,” continuous learning, unified planning for general and AI computing, and secure autonomy. The term may sound ambitious but addresses a real challenge: scaling agent execution isn’t just about having GPUs or NPUs available but about coordinating compute, memory, networking, inference, and isolation with very low latency.

One of the main products is AI Cluster Service, AICS, built on the UnifiedBus network. According to Huawei Cloud, this service supports clusters of over 100,000 cards with a total capacity of up to 200 EFLOPS. The company also claims it reduces token generation latency to less than 10 milliseconds and achieves a performance of 5 million tokens per second on 1,000 cards, with an online availability of 99.95%.

AICS positions itself as a component to operate large inference and training workloads, especially in scenarios where request volume and latency are critical. Practically, Huawei aims to position its infrastructure as an alternative for companies and industries that need to deploy AI at scale without relying on a fragmented collection of services.

Product or ServiceMain FunctionKey Announced Data
AI Cluster Service (AICS)AI clusters for training and inferenceOver 100,000 cards and up to 200 EFLOPS
Agentic Memory Storage (AMS)Persistent, scalable memory for agentsPB-scale storage and tiered KV-cache pooling
CCE VolcanoNextUnified planning for general and AI workloadsOver 30% improvement in resource utilization
AgentSphereSecure, elastic runtime for agentsStartup in less than 100 ms
ModelArtsNextTraining and inference platformModel routing and enterprise RLaaS
AgentArtsEnterprise agent platformLong tasks, security, sector-specific know-how, observability
openJiuwenOpen source version of AgentArtsShares over 90% of kernel with AgentArts Enterprise
CloudRoboCloud platform for robotsMigration in hours, model deployment in minutes

Another key component is Agentic Memory Storage (AMS), designed to break the memory bottleneck for agents. Huawei Cloud explains that it combines NPU passthrough with Context Memory Storage to create a petabyte-scale memory space. It also supports tiered KV-cache pooling, an important technique to reduce inference costs and enable long-duration tasks.

This aspect is crucial because agents not only require computation but also need to maintain context, retrieve memory, operate for hours or days, and manage intermediate states. Without an efficient memory layer, the costs and complexity of agents skyrocket.

CCE VolcanoNext functions as a unified scheduling engine for general and AI workloads. Huawei Cloud claims that shared training and inference pooling, along with fragmentation consolidation, can improve resource utilization by over 30%. In enterprise environments, this efficiency can make significant economic differences because many AI projects fail not from lack of models but due to costs of infrastructure that are difficult to sustain.

ModelArtsNext and AgentArts: from models to enterprise agents

Huawei Cloud also introduced ModelArtsNext, a new platform for training and inference of models. Its four main capabilities are Reinforcement Learning as a Service, confidential inference, model routing, and a model matrix. MaaS routing offers three policies: prioritize experience, prioritize efficiency, or a balanced mode. The platform dynamically determines which model handles each request based on its features.

Huawei Cloud states that it already offers more than 15 advanced models with scheduling accuracy above 95% and an average cost reduction of 20%. The approach responds to a clear trend: companies no longer want to depend on a single model for everything. They need to route each task to the most suitable model based on cost, performance, precision, latency, or compliance.

Enterprise RLaaS is another important bet. Huawei Cloud aims to make reinforcement learning accessible for organizations looking to fine-tune models for specific processes. The company claims users can create tasks in one minute, have end-to-end visualization, and maintain consistency between training and inference.

AgentArts completes the platform layer. It is an enterprise solution for creating and deploying AI agents, with four main capabilities: long-duration tasks in production, enterprise security, deep sector knowledge, and end-to-end observability. Huawei Cloud speaks of “harness engineering,” which involves organizing and controlling how agents use tools, data, memory, models, and processes.

The company also launched openJiuwen, an open source edition of AgentArts that shares over 90% of its core with the enterprise version. This move can help attract developers and partners, though the enterprise value will still rely on integration with Huawei’s infrastructure, support, governance, and cloud services.

Additionally, AgentArts Orchard serves as a portal that consolidates agentic cloud services, agents, models, and applications. The goal is to automate the entire process—from understanding intent and developing functionalities to resource provisioning and application deployment. In other words, Huawei envisions agents participating in the creation and operation of new services.

Security, hybrid cloud, and industry as core axes

Security played a central role in the announcement. Huawei Cloud introduced a solution covering the entire AI lifecycle, protecting agents, models, and agentic infrastructure. Highlights include a dedicated hardware data encryption zone, Hold Your Own Key technology, data capsules, and multi-dimensional isolation for agentic infrastructure. The message is clear: the company aims to address sovereignty, confidentiality, and data control concerns in regulated sectors.

An additional confidential computing solution for AI was also announced, including confidential virtual machines, remote attestation in the cloud, key management, confidential inference gateways, and PCIPC-based NPU passthrough. These are designed for scenarios involving confidential inference, private pretraining, and federated learning, where data and models hold high value and cannot be exposed without assurances.

Huawei Cloud also released the white paper Building Agent-Oriented Hybrid Cloud for Enterprises, focusing on the evolution of hybrid cloud in the era of agents. It states that its hybrid cloud serves over 5,500 customers worldwide and maintains a prominent position in financial hybrid cloud and dedicated cloud. The document discusses building AI data lakes, coordinating stable online models with more agile offline iterations, and creating secure environments for agent development and deployment.

Regionally, Huawei Cloud announced four zones within its Industry AI Foundry: Smart Healthcare Zone, Embodied AI Zone, Smart Manufacturing Zone, and Scientific Computing Zone. The Healthcare zone is being strengthened with an AI platform for health that will go into open beta on June 30, along with an intelligent pathology solution already deployed in hospitals at various levels across China. More than 20 hospitals are involved, according to the company.

The Embodied AI Zone aims at physical AI and robotics. Huawei Cloud presented CloudRobo, an intelligent robot development platform combining petabyte-scale data, development pipelines, a cloud-native robotic model engine, and a Real-Sim system for data generation and evaluation. The company affirms that this allows robots to migrate to the cloud in hours and deploy models in minutes, with open beta scheduled for June 30.

The Smart Manufacturing Zone aims to facilitate industrial agents, while the Scientific Computing Zone targets clients in AI4S—AI for science—using models and agents to accelerate research.

Huawei Cloud also announced an AI Model Partner Program with over 20 model providers, including Zhipu AI, DeepSeek, MiniMax, Kimi, StepFun, Baidu, iFLYTEK Spark, Meituan, AIsphere, and Shengshu Technology. The goal is to build a diverse and connected ecosystem of models integrated with their cloud services.

Huawei Cloud’s announcement reflects a broad strategy: not just selling models or infrastructure alone, but offering a comprehensive stack for the agentic era. Clusters, memory, model routing, secure runtime, agent platforms, hybrid cloud, security, industry sectors, and model ecosystems are all part of this unified vision. The challenge will be demonstrating that this integration can compete outside China in an increasingly regulated, sovereignty-driven, sanctioned, and trust-dependent global market.

Frequently asked questions

What is Huawei Cloud’s Agentic Infra?

Agentic Infra is Huawei Cloud’s new infrastructure proposal for running general workloads and agentic AI. It integrates compute, memory, planning, security, and runtime for enterprise agents.

What does Agentic Memory Storage provide?

Agentic Memory Storage (AMS) creates a petabyte-scale memory space and supports tiered KV-cache pooling to reduce inference costs and enable long-duration agent tasks.

What is AgentArts?

AgentArts is Huawei Cloud’s enterprise platform for creating, deploying, and operating AI agents in production, with capabilities for long tasks, security, sector-specific expertise, and comprehensive observability.

Which sectors does Huawei Cloud prioritize?

Huawei Cloud has announced dedicated zones for smart healthcare, physical AI and robotics, intelligent manufacturing, and scientific computing. It also has a program with model providers like DeepSeek, MiniMax, Kimi, and Zhipu AI.

via: huawei

Scroll to Top