Lenovo and NVIDIA Bring “AI Factories” to Gigawatt Scale to Accelerate Enterprise AI

CES showcases are usually filled with promises about power, screens, or graphics. But in the 2026 edition, Lenovo and NVIDIA wanted to focus on a less flashy, yet much more critical concept for the immediate future of Artificial Intelligence: the industrialization of compute. At Tech World @ CES 2026, held at Sphere (Las Vegas), both companies announced the Lenovo AI Cloud Gigafactory with NVIDIA, a program designed to help AI cloud providers deploy “AI factories” at gigawatt scale and bring advanced services into production more quickly.

The idea of an “AI factory” isn’t just marketing fluff. It addresses a real challenge companies face firsthand: training models is expensive, but getting them into production — with real data, real users, and real latencies — is where the ROI is decided. And here, the bottleneck is no longer always “how many GPUs” are available, but how quickly infrastructure investments translate into operational results.

The new metric: “Time to First Token”

Lenovo and NVIDIA introduced a key indicator: the TTFT (Time To First Token), which measures how long it takes for a system to return the first usable output fragment after a request to a model. In a world where AI agents are starting to execute tasks, query tools, and respond in real-time, this detail shifts from being purely technical to a business concern: a high TTFT results in sluggish experiences, lost productivity, and increased costs.

The program aims for providers to achieve TTFT “in weeks” through an “industrialized” approach: ready-to-deploy components, expert guidance, and repeatable build processes, designed for environments that need rapid scaling without surprises. In short: less DIY data center, more “assembly line” for AI.

Gigawatts, millions of GPUs, and the efficiency challenge

Talking about “gigawatts” is no trivial matter. It involves facilities with such high computing density that they require reevaluating thermal design, networking, storage, and operations. Lenovo leverages its historical strength here: Neptune, its liquid cooling technology for high-density infrastructures, and its global manufacturing and integration capabilities. The joint proposal promises to reduce friction from design through operation, with full lifecycle services (Lenovo Hybrid AI Factory Services) and a catalog of repeatable use cases (Lenovo AI Library) to speed up deployment.

The clear goal is: shortening the time between “buying compute” and “billing for AI services”. This sends a direct message to providers competing for enterprise clients with increasingly critical-system-like demands: availability, predictability, security, energy efficiency, and consistent response times.

Cutting-edge hardware: Blackwell Ultra today, Rubin tomorrow

On the technical front, the announcement aligns with NVIDIA’s roadmap. The program includes access to the NVIDIA Blackwell Ultra architecture for custom-designed clusters with options for accelerated compute, storage, and networking. Lenovo highlights the NVIDIA GB300 NVL72 configured for Lenovo: a “rack-scale” system with 72 Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs integrated into a liquid-cooled rack-level platform.

But the ambition extends beyond the present. Lenovo states that the program will also support the newly announced NVIDIA Vera Rubin NVL72, NVIDIA’s flagship system for training and inference, which combines 72 Rubin GPUs, 36 Vera CPUs, and a network and security suite geared toward next-generation AI factories, featuring ConnectX-9 SuperNICs, BlueField-4 DPUs, and Spectrum-X Ethernet, plus new switching options like Spectrum-6 and photonic Ethernet switches.

The strategic takeaway is clear: the goal isn’t just to sell servers, but to ensure clients can migrate seamlessly from one generation to the next without redesigning their operational models from scratch, with a clear upgrade path as models grow larger, more multimodal, and more dependent on ultrafast networks and storage.

From raw power to “manufacturing speed” deployment

Both companies’ executives framed the announcement as a paradigm shift. Lenovo emphasized that value is no longer solely measured by available compute, but by how quickly that compute produces results. NVIDIA reaffirmed the idea that each country and sector will eventually build or rent AI factories to “produce intelligence,” reflecting a market evolution toward AI services as an industrial product with metrics, quality control, and scale.

Within this context, “operation” is as important as “hardware”. That’s why the program combines infrastructure, networking, software (including integration with NVIDIA AI Enterprise and open models like Nemotron), and services. It targets providers that need to become AI partners for enterprises, not just GPU rental companies.

Why this announcement matters for enterprise AI

For many organizations, the key question is no longer “whether to use AI,” but where to run it and with what guarantees: public cloud, on-premises, or hybrid models. Lenovo and NVIDIA’s approach offers a pathway for large-scale AI deployment focused on a concrete goal: reducing time to production, with TTFT as a symbol of a broader obsession with latency, stability, and user experience.

An additional detail: the announcement was made at Sphere, a venue known for its cutting-edge immersive audiovisual production technology. Lenovo highlighted that its infrastructure participated directly in the content creation flows of the venue, signaling that “AI factories” are not an abstract future but a practical tool for industries already managing massive data today.


Frequently Asked Questions

What does “AI factory” mean and why is gigawatt-scale mentioned?

An AI factory is an infrastructure environment designed to produce intelligence at large scale (training, inference, and model operation) with optimized networks, storage, and software. The gigawatt scale refers to high-density facilities that require advanced thermal and energy design to handle enormous computational loads.

What is “Time to First Token” (TTFT) and why is it so critical?

TTFT measures how quickly a system provides the first useful output when requesting a response from a model. In enterprise applications with agents and real-time workflows, a low TTFT improves user experience, reduces wait times, and helps optimize operational costs.

What is an NVL72 system like GB300 NVL72 or Vera Rubin NVL72?

NVL72 generally refers to rack-scale platforms integrating 72 GPUs and associated CPUs designed for large-scale AI, equipped with high-performance networks and architectures optimized for massive training and inference workloads.

What benefits does Neptune liquid cooling bring to high-density AI deployments?

Liquid cooling more efficiently removes heat compared to air cooling in densely packed racks, improving thermal stability and reducing energy consumption related to cooling systems, enabling growth in AI workloads without significantly increasing energy footprint.

via: news.lenovo

Scroll to Top