X (Twitter) Facebook Pinterest LinkedIn E-mail

NVIDIA is trying to change the language of the entire data center industry. It’s no longer just about GPUs, servers, or accelerated clusters but about “AI factories”: artificial intelligence plants designed to produce tokens continuously, just as an industrial plant produces electricity, steel, or components. The metaphor is commercial, but it helps to understand a real shift: AI can no longer be treated as a software layer running on generic infrastructure.

In NVIDIA’s vision, an AI factory converts energy into intelligence. The production unit isn’t a physical piece but the token that a model generates when reasoning, responding, writing code, coordinating agents, or performing a task. That’s why the metrics that matter start to resemble those of heavy industry more than a SaaS application: tokens per second, tokens per watt, cost per token, infrastructure utilization, and availability.

Inference is no longer an isolated query

The big change lies in the workload. Generative AI initially started for many users as a text box: you write a question, the model responds, and the interaction ends. Agentic AI breaks that pattern. An agent can plan, search for information, call tools, read documents, write code, query databases, create sub-agents, and make chained decisions.

This makes inference a longer, more interactive process that is harder to orchestrate. It’s no longer enough to have a powerful GPU waiting for a request. Memory, storage, network, CPU, software, queues, and external services must be coordinated so that the entire flow proceeds without unnecessary delays.

NVIDIA frames this as a full-stack problem. Models require accelerated compute, but also fast memory, storage for context, low-latency networks to coordinate services, and software capable of maintaining high system utilization. If one layer lags, the cost per token increases, and the user experience degrades.

Metric	What it measures in an AI factory
Tokens per second	Capacity to produce responses and actions
Tokens per watt	Energy efficiency of the system
Cost per token	Economic viability of inference at scale
Utilization	Degree of GPU, CPU, memory, and network usage
Uptime	Continuity of AI production
Latency	Response time in agents and interactive applications

This perspective has implications for any company looking to deploy serious AI. The debate is no longer limited to choosing a model. It involves deciding where it runs, the cost per interaction, acceptable latency, how to maintain context, what data to retrieve, and how much energy the infrastructure consumes.

Blackwell, Vera Rubin, and the token economy

NVIDIA positions Blackwell Ultra and GB300 NVL72 systems as responses to this new economy. According to the company, these systems can generate 50 times more tokens per megawatt than the Hopper generation and reduce the cost per token by a factor of 35. These figures are provided by NVIDIA and should be interpreted within their own comparative framework, but they indicate the direction of competition: producing more intelligence with less energy.

The company also highlights NVIDIA Dynamo, a framework designed to orchestrate long-context inference and high volumes of requests. In an AI factory, software determines much of the economics. It must route requests, manage memory, balance latency and throughput, coordinate services, and prevent expensive hardware from waiting idle.

The next step is Vera Rubin. NVIDIA claims this platform, along with LPX, is designed to elevate performance per watt again in reasoning loads and agentic AI. The message is clear: the company aims to shift the conversation from “which GPU should I buy” to “which AI factory can I operate at the lowest cost per token.”

This strategy also shields NVIDIA from increasingly specialized competitors. ASICs, inference chips, LPU, TPU accelerators, and custom accelerators target specific market segments with better costs or latencies. NVIDIA responds by expanding the scope: it’s not just selling chips but the entire architecture.

Design before building

AI factories cannot be improvised. A traditional data center could scale by adding servers, more storage, or new racks. In AI, power density, liquid cooling, interconnections, load balancing, and power supply all force a design that treats the system as a single unit.

NVIDIA speaks of extreme co-design: hardware, networking, memory, storage, software, energy, and cooling thought out together from the start. It also mentions their reference designs DSX and the use of digital twins with Omniverse DSX Blueprint to model facilities, equipment, cooling, and operations before actual deployment.

This is especially critical for projects involving hundreds of megawatts or even gigawatts. An error in electrical or thermal design could limit growth capacity for years. AI does not forgive wasting energy, space, or cooling, since inefficiencies translate into higher token costs.

Layer of the AI factory	Why it matters
Accelerated compute	Runs models, reasoning, and inference
Network	Coordinates thousands of accelerators and services
Memory	Feeds models and long contexts
Storage	Stores data, vectors, and state
Software	Orchestrates loads and maximizes utilization
Energy	Limits economic scale of deployment
Cooling	Enables high densities without degradation

NVIDIA also aims to extend this architecture beyond hyperscalers. It cites collaborations with Cisco, Dell, HPE, Lenovo, and Supermicro to bring AI infrastructure closer to enterprise data centers. The idea is for an AI factory to start with a specific business workload and scale into broader use cases afterward.

Companies building or leasing intelligence

NVIDIA’s most ambitious claim is that every organization will need to build or lease an AI factory. Not all will use their own infrastructure. Many will turn to cloud, neo-clouds, colocation providers, or managed platforms. But the premise makes sense if AI shifts from being an occasional tool to a permanent layer of work.

A financial institution might use agents for risk analysis, compliance, internal support, and software development. A pharmaceutical company could rely on AI for simulation, scientific documentation, and molecule discovery. Industries may use agents for maintenance, planning, robotics, and design. In all these cases, the key question remains: how to produce intelligence securely, efficiently, and consistently.

NVIDIA claims to already operate its own enterprise AI factory, with hundreds of autonomous agents supporting engineering, software, and operations teams. It’s a way to demonstrate that the idea isn’t just about selling infrastructure but about reorganizing work within a company.

The less comfortable aspect of this vision is its energy dimension. If an AI factory transforms electricity into tokens, energy becomes the raw material of artificial intelligence. This necessitates serious attention to cost, electricity source, thermal efficiency, and power availability—just as software licenses used to be.

The next phase of AI won’t be decided only by more capable models but also by who can serve them at the lowest cost per token, lowest response energy, and highest availability. NVIDIA wants this battle to be fought within an architecture that controls everything from end to end: GPU, network, software, systems, partners, and data center design.

Cloud computing promised to abstract infrastructure. AI makes it visible again. Behind every reasoning agent, each coding assistant, and every responding model, there is a physical factory producing tokens nonstop.

Frequently Asked Questions

What does NVIDIA mean by AI factory?
An infrastructure designed to produce tokens continuously through models, agents, accelerated compute, network, memory, storage, software, energy, and cooling coordinated as a single system.

Why is cost per token so important?
Because it determines whether a company can scale AI profitably. The lower the cost per token, the more viable it is to use models and agents in large-scale processes.

What changes with agentic AI?
Agents perform long and chained tasks: searching, reasoning, using tools, calling services, and executing actions. This requires more infrastructure coordination than a simple chatbot query.

Will all companies need to build their own AI factory?
Not necessarily. Some will do so for scale, security, or sovereignty reasons. Others will lease capacity in the cloud, neo-clouds, or specialized providers. The key will be controlling cost, performance, security, and availability.

Source: Nvidia Blog and Noticias Inteligencia Artificial

X (Twitter) Facebook Pinterest LinkedIn E-mail

NVIDIA Wants to Turn Data Centers into AI Factories

Inference is no longer an isolated query

Blackwell, Vera Rubin, and the token economy

Design before building

Companies building or leasing intelligence

Frequently Asked Questions

About The Author

Alex D. Smither W.