X (Twitter) Facebook Pinterest LinkedIn E-mail

Artificial Intelligence promised abundance, productivity, and widespread access to capabilities once reserved for large laboratories. For a time, that narrative seemed to work: writing tools, programming assistants, business copilots, and early autonomous agents entered the market with affordable prices, free trials, or subscriptions closely resembling traditional SaaS software.

That phase is coming to an end. Generative AI doesn’t behave like a conventional application. Each question, each document read, each intermediate reasoning, each long response, and each action performed by an agent consumes tokens. And when these agents move from demonstrations to permanent workflows, the costs scale dramatically.

The problem isn’t only that models are expensive. The issue is that agentic AI consumes differently. An assistant responds to a single query. An agent plans, reads, writes, checks, retries, calls external tools, and may work for minutes or hours on a task. At each step, input tokens, output tokens, context, cache, and sometimes internal reasoning accumulate. The economy is no longer measured per user but by the actual volume of inference.

The End of the Comfortable Flat Rate

The clearest sign of this shift comes from the platforms themselves. GitHub announced that starting June 1, 2026, all Copilot plans will convert to a usage-based billing system using GitHub AI Credits. The company will still offer subscription plans, but consumption will be calculated based on input, output, and cache tokens, depending on the model used. This is a significant move because Copilot is no longer just an in-editor assistant but a platform with increasingly agent-like flows and long sessions over entire repositories.

Microsoft also hinted at where the market is heading. The Verge reported that the company plans to withdraw most internal licenses for Claude Code across some teams and shift many developers to GitHub Copilot CLI. This should not be interpreted simply as rejection of Anthropic. Microsoft continues integrating third-party models into its products. But it illustrates that even one of the largest tech companies is streamlining internal access to AI tools as consumption begins to grow costly.

Uber offers another warning. Forbes reported that the company exhausted its AI budget for 2026 in just four months due to intense use of Claude Code. Although such figures depend on internal sources and should be taken cautiously, they align with what many companies are observing: budgets allocated for pilots are inadequate when teams adopt AI daily and agents work on real tasks.

The paradox is clear. The more useful AI becomes, the more it is used. And the more it is used, the harder it becomes to sustain the idea of an unlimited flat rate. Abundance doesn’t disappear; it begins to come with conditions.

Comparison Table: Prices per 1 Million Tokens

The following prices are indicative and may vary depending on the date of reading, region, execution mode, context size, cache usage, batch processing, priority, enterprise discounts, or commercial changes by providers. Not all models are equal in quality, speed, compliance, support, or availability.

Region	Company	Reference Model	Input per 1M tokens	Output per 1M tokens	Agentic reading cost
USA	OpenAI	GPT-5.5	$5.00	$30.00	Very costly for tasks with high text generation
USA	Anthropic	Claude Opus 4.7	$5.00	$25.00	High output cost, though savings with cache or batching
USA	Google	Gemini 3.5 Flash High	$1.50	$9.00	More competitive, but reasoning mode increases expense
USA	xAI	Grok 4	$1.25	$2.50	Aggressive pricing compared to other US models
China	DeepSeek	DeepSeek V4 Pro	$0.435	$0.87	Very low cost for massive flows and agents
China	Alibaba/Qwen	Qwen-Max	$2.50	$7.50	Intermediate cost with its own cloud ecosystem
China	Z.ai/Zhipu	GLM-5.1	$1.40	$4.40	Competitive alternative for reasoning and coding
China	Baidu	ERNIE 4.5	≈$0.59	≈$2.35	Approximate prices converted from yuan
China	MiniMax	MiniMax M2.7	$0.30	$1.20	Highly attractive for high-volume multi-agent architectures

The difference between US and Chinese models is significant. In workflows where an agent generates a lot of text, reviews code, produces documentation, or runs multiple reasoning rounds, the output cost far exceeds input costs. That’s where a price gap from $25–$30 per million tokens down to less than $2 can dramatically influence the economic viability of a project.

But price isn’t everything. Choosing a cheap Chinese model may raise issues around latency, data residency, compliance, enterprise support, integration, security controls, and geopolitical dependence. For a startup or tech lab, cost might be the decisive factor. For regulated companies, not always.

The New Economic Inequality in AI

Discussions about AI often focus on capabilities: which model reasons best, which programs better, which benchmarks are achieved, or which has the most context. But real adoption in businesses increasingly hinges on a less shiny question: how much does it cost to use every day?

An extreme example helps illustrate this. Tom’s Hardware reported that Peter Steinberger, creator of OpenClaw and an employee of OpenAI, spent over $1.3 million on OpenAI tokens in just 30 days, with 603 billion tokens and 7.6 million requests generated across about 100 Codex instances. While not representative of a typical company, it shows what happens when limits are removed, and agents work continuously on core tasks.

Salesforce points in the same direction from a different angle. Marc Benioff said the company could spend around $300 million on Anthropic tokens this year, mainly for programming and automation agents. That figure doesn’t imply irrational spending—it could be justified if productivity gains outweigh costs. But it confirms that agentic AI is no longer a minor software line; it becomes part of strategic infrastructure.

For large tech firms, banks, pharma, or global consultancies, such costs are manageable if returns are clear. For universities, small outlets, independent developers, SMEs, or research teams with limited budgets, the scenario looks different. If access to advanced models and persistent agents is tied to monthly bills in five or six figures, AI won’t reduce disparities; it may even widen them.

Thus, the promise of technological abundance faces a physical reality: GPUs, data centers, energy, memory, networks, and specialized talent. Intelligence may seem like software, but it’s executed on very expensive infrastructure.

The Answer Won’t Be Always Using the Cheapest Model

The solution isn’t solely about replacing a US model with a cheaper Chinese one. The next phase of enterprise AI will require architectural decisions. Organizations will need to determine which tasks warrant cutting-edge models, which can be handled by smaller ones, which parts can run locally, when to use cache, how to limit persistent agents, and how to measure the cost per business outcome.

This paves the way for a discipline increasingly vital: AI FinOps. Just as cloud computing forced control over machines, storage, and traffic, AI demands managing tokens, context, cache, tool calls, and team consumption. Without this visibility, adoption may seem successful until the bill arrives.

AI hasn’t failed because it’s becoming expensive. It’s entering a mature phase. Initial subsidies, generous testing, and flat rates helped create habits and accelerate markets. Now, the less comfortable part begins: demonstrating which tasks generate enough value to justify their real execution cost.

The era of agents will not be decided solely by who has the smartest model. It will also depend on who can afford to keep it operational and thinking.

Frequently Asked Questions

Why are AI agents more expensive than chatbots?
Because they operate in multiple steps: reading context, planning, calling tools, executing actions, reviewing results, and retrying. Each step consumes tokens.

Can token prices change?
Yes. They may vary depending on date, country, provider, model, context, cache use, batch processing, priority, or enterprise agreements.

Are Chinese models always the best choice for cost?
Not necessarily. They may cost less, but issues around privacy, compliance, support, latency, availability, data quality, and requirements need assessment.

What should companies do to control AI spending?
Measure usage per use case, set limits, choose different models by task, cache responses, avoid uncontrolled agents, and evaluate cost per business outcome—not just token cost.

X (Twitter) Facebook Pinterest LinkedIn E-mail