The first stage of generative artificial intelligence in business was marketed as an almost inevitable promise of productivity: more code, more documents, more automation, and less time wasted on repetitive tasks. The second stage is proving to be much less epic: reviewing invoices, setting spending limits, and explaining to finance why a tool that seemed like just another SaaS license is behaving like a variable consumption infrastructure.
The problem is not that AI doesn’t work—that would be a too-simplistic view. The real issue is that when it does work, it’s used far more than expected. And when it’s heavily used, it exposes a truth the market has tried to delay for two years: running advanced models at scale costs a lot of money. Paying a flat fee per user isn’t enough when behind the scenes there are long sessions, agents traversing entire repositories, huge contexts, extensive responses, and multiple models working in chains.
Flat-rate pricing was a market acquisition phase
During the initial phase, many AI tools were financed under a logic similar to other tech markets: grow fast, gain users, create habits, and assume costs will decrease over time. That strategy makes sense when the marginal cost is low or tends toward zero. In generative AI, the marginal cost doesn’t disappear. Each interaction consumes inference, energy, memory, network bandwidth, and GPU capacity. Every agent working in the background turns the promise of cheap software into a real operational burden.
GitHub has named this shift. Starting June 1, 2026, Copilot will begin consuming GitHub AI Credits across all plans. The company will maintain base prices, but usage will be calculated using tokens for input, output, and cache, depending on the model used. The official explanation is clear: Copilot is no longer just an assistant inside the editor but a platform with agentic capabilities capable of executing long, multi-step sessions over complete repositories. This usage demands much more computational and inference resources than a quick chat question.
Company documentation reinforces this message. GitHub defines AI Credits as billing units, with one credit equivalent to $0.01, allowing organizations, companies, cost centers, or users to control budgets. It also makes clear that a long session with a programming agent using a border model costs more because it involves more work.
The business perspective is clear: AI costs are no longer hidden inside a license and are becoming more like cloud spending. No rational entity would deploy infrastructure without budgets, limits, observability, and alerts. Yet many companies deployed AI as if it were just another office tool, and that stage is coming to an end.
Microsoft, Uber, and the financial awakening of AI
Microsoft’s case is particularly symbolic. The Verge reported that the company plans to retire most internal licenses of Claude Code in its Experiences + Devices division and shift many developers to GitHub Copilot CLI. Microsoft internally announced that this decision aims to converge on Copilot CLI as the main agentic interface, although sources cited suggest there’s also a financial reason and a fiscal year-end consideration behind it.
It’s important not to oversimplify. Microsoft isn’t saying Claude is ineffective. In fact, information indicates that Anthropic’s models will still be available through Copilot CLI, and Microsoft continues using Claude in various products. The key point is another: even a tech giant with privileged cloud and model infrastructure is rationalizing its internal access to AI tools as usage starts impacting the P&L.
Uber signals a similar trend. AI Magazine reported the company exhausted its AI budget for 2026 in just four months, largely due to intensive use of assisted programming tools. Although secondary sources should be taken cautiously, this aligns with a pattern many companies are noticing: actual adoption often consumes far more than initial pilot projects justified.
Anthropic has also had to revise its public cost estimates for Claude Code. Business Insider reported the company increased its average cost per developer per active day from $6 to $13 in enterprise deployments, with a monthly range of $150 to $250 per developer. Anthropic explained this isn’t a price increase but an update reflecting the use of more advanced models and different consumption patterns.
This distinction matters. Official prices might stay the same, but actual costs do not. If a tool improves, users will utilize it more. If agents can handle longer tasks, they consume more context. If an automated process handles reviews, tests, documentation, and incident analysis, the bill will grow—even if each token costs less than a year ago.
| Market Signal | What It Truly Indicates | Implication for Companies |
|---|---|---|
| Copilot transitions to AI Credits | Flat-rate pricing doesn’t support intensive agentic use | Need budgets and limits per team |
| Microsoft reduces internal Claude licenses | Tool choice is also a financial decision | More pressure to consolidate vendors |
| Claude Code updates cost estimates | More capable models change consumption patterns | Pilot budgets are no longer reliable for annual forecasts |
| Uber reportedly exhausted its AI budget | Mass adoption can surpass initial projections | Financial operations for AI become non-optional |
| Code agents with multimillion-dollar bills | Continued automation accelerates consumption | Must decide which tasks deserve cutting-edge models |
Productivity isn’t enough without cost measurement
The common defense of these tools is that increased productivity justifies the cost. That might be true, but that reasoning is no longer sufficient. Companies need to know what productivity they’re gaining, in which teams, with which models, and at what cost. Without measurements, AI becomes a budget-expanding expense driven by enthusiasm rather than ROI.
The extreme example of OpenClaw illustrates this problem well, even if it’s not representative of an average company. Tom’s Hardware reported that Peter Steinberger demonstrated a consumption of over $1.3 million in OpenAI tokens over 30 days, with 603 billion tokens and 7.6 million requests generated by about 100 Codex instances. While this was an unrestricted development lab paid for by OpenAI, it helps visualize what happens when constraints are eliminated.
On the other end is Salesforce. Marc Benioff stated the company will likely spend around $300 million on Anthropic tokens in the year, simultaneously emphasizing the productivity gains from programming agents and advocating for an intermediate layer to decide which requests should go to frontier models versus smaller ones.
That intermediate layer will be one of the key components of the next phase. Companies can’t send everything to the most powerful models nor degrade all cases to smaller models if quality, reasoning, or reliability are needed. Intelligent routing of models, caching, task limits, quality evaluation, consumption monitoring, and clear policies on when to pay a premium for a high-end model will be essential.
AI doesn’t replace the cloud—it makes it a financial discipline
The key lesson from this phase is that enterprise AI resembles less traditional software and more critical infrastructure. It involves variable costs, dependency on external capacity, risk of overuse, provider differences, latency issues, security requirements, and architectural decisions. Therefore, the conversation will shift toward AI FinOps, hybrid cloud, open-source models, self-inference, and data governance.
Not every company needs to build its own AI infrastructure. For many, consuming APIs remains the most practical approach. But stable, repetitive, sensitive, or high-volume workloads will start to be viewed differently. If a workflow consumes millions of tokens daily, whether it should always run on an external provider will no longer be a hypothetical question.
The era of “testing everything because AI is cheap” is giving way to a more mature phase: deploying AI where it makes sense, measuring ROI, and designing architectures that don’t turn productivity gains into unpredictable bills. Providers will try to defend margins. Clients will try to control costs. Expect tense negotiations on prices, limits, models, and true value.
AI isn’t becoming expensive because it’s failing—it’s becoming expensive because it’s genuinely being used. The uncomfortable question is: who captures the value of that use? Is it the model provider, the development platform, the cloud that runs inference, or the company that should turn tokens into measurable productivity?
Frequently Asked Questions
Why is enterprise AI getting more expensive?
Because usage has shifted from occasional tests to integrated tools in daily work, programming agents, automations, and long sessions consuming many more tokens.
What are GitHub Copilot AI Credits?
Billing units that reflect AI model consumption in Copilot, calculated from input, output, and cache tokens depending on the model used.
Will flat-rate AI licenses disappear?
Not necessarily, but they are likely to include limits, credits, budgets, or surcharges for intensive use. Flat rates without control are hard to sustain with autonomous agents.
What should companies do?
Measure consumption by team and use case, set budgets, choose models based on tasks, leverage caching, review contracts, and treat AI as a variable infrastructure—not just a software license.

