NVIDIA continues to be the company setting the pace of the artificial intelligence infrastructure, but the debate within the market is starting to shift. The question is no longer just whether its GPUs are the most powerful or if its software ecosystem remains the most comprehensive. The discussion gaining traction among clients, analysts, and engineers is more uncomfortable: what is the actual cost of running large-scale AI when you factor in chips, energy, cooling, networking, integration, maintenance, and vendor dependency.
An Evercore ISI report, cited in the initial briefing, highlights this tension. According to the analysis, NVIDIA’s claim of up to a 35-fold advantage in total cost of ownership (TCO) doesn’t fully convince the average AI engineer. It also detects a widespread perception that the company’s gross margins, above 70%, are excessively high. This message doesn’t imply NVIDIA is losing its leadership position, but it does suggest that part of the market is seeking “good enough” alternatives or proprietary ASICs to improve the economics of their deployments.
Cost per token is no longer measured only in GPUs
NVIDIA’s central promise for its next-generation Vera Rubin platform is clear: higher performance and lower cost per token. In their official documentation, the company states that Vera Rubin NVL72 will deliver AI inference at one-tenth of the cost per million tokens compared to Blackwell, and training with a quarter of the GPUs. This is a significant improvement on paper, especially in a market where agentic inference is beginning to multiply calls, queries, context, and tool usage.
The problem is that major infrastructure purchasers don’t look only at the chip. According to Evercore ISI, some hyperscalers question whether TCO calculations accurately account for the electricity consumption around the accelerator, including cooling. This part can represent between 30% and 50% of overall operational costs, based on the report’s extract.
This nuance shifts the conversation. An accelerator might be far more efficient in calculations, but the total cost of an AI rack also depends on power supply, liquid cooling, switches, optics, storage, data center occupancy, technician availability, software, and support contracts. In a real AI factory, the GPU doesn’t work alone.
| Key Metric | Available Data |
|---|---|
| NVIDIA GAAP gross margin in Q4 FY2026 | 75.0% |
| NVIDIA non-GAAP gross margin in Q4 FY2026 | 75.2% |
| NVIDIA revenue in Q4 FY2026 | $68.127 billion |
| Data Center revenue in Q4 FY2026 | $62.314 billion |
| Official advantage NVIDIA announced for Vera Rubin NVL72 | 1/10 of the cost per million tokens compared to Blackwell |
| Estimated share of energy and cooling in operational overhead, according to Evercore ISI | 30-50% |
| Expected arrival of Vera Rubin for hyperscalers, according to Evercore ISI | 2Q 2026 |
| Expected access for enterprise OEMs, according to Evercore ISI | September-October 2026 |
Margins partly explain this pressure. NVIDIA closed its fiscal Q4 2026 with a GAAP gross margin of 75.0% and a non-GAAP gross margin of 75.2%, on a record revenue quarter of $68.127 billion. Its data center business reached $62.314 billion in that same period, confirming how much the company has become the dominant provider of AI infrastructure.
Why ASICs are re-entering the conversation
The willingness to use ASICs or “good enough” alternatives isn’t new, but it gains strength as costs scale. Major cloud operators, AI labs, and some consumer platforms have enough volume to justify custom chips, provided the savings outweigh the effort of design, integration, and software.
Google has been using its TPU for years. Amazon has Trainium and Inferentia. Microsoft has developed Maia. Meta is working on its own accelerators. The idea isn’t to replace NVIDIA overnight but to reduce dependence for specific workloads, especially stable inference, internal models, recommendation, ranking, search, or tasks where the flexibility of CUDA and the NVIDIA ecosystem isn’t as critical.
NVIDIA retains a hard-to-replicate advantage. Its GPUs are more than silicon – they are CUDA, libraries, compilers, networks, complete systems, reference architectures, talent availability, and a network of partners that mitigate risks for quick deployment. But as AI spending increases, there will be more pressure to optimize each layer.
Agentic inference intensifies this pressure. AI agents don’t produce a single response and stop; they can query databases, run code, open sandboxes, search documents, call tools, and repeat steps until completing a task. This raises token consumption and also load on CPUs, memory, networking, and storage. That’s why Vera, NVIDIA’s CPU for agents, appears as an important component within Vera Rubin: not all agentic work is done on the GPU.
Evercore ISI also notes that there are no significant issues in hyperscalers’ readiness for mass production of Rubin. This part of the report is favorable for NVIDIA. If Vera Rubin reaches large clients in 2Q 2026 and then enterprise OEMs in September or October, the company can support its narrative with actual hardware, not just presentations.
The challenge: demonstrating TCO in real deployments
The key will be in production deployments. Cost per token promises are helpful for market guidance, but customers will measure success with their own workloads: language models, internal agents, vision, recommendation, analytics, training, fine-tuning, vector databases, and data pipelines.
Significant differences may arise. An AI lab training frontier models might prioritize memory, interconnectivity, and extreme performance. A bank running internal agents might focus on security, latency, data governance, and cost predictability. A hyperscaler will look at cost per token, rack density, energy efficiency, and the ability to operate tens of thousands of chips without bottlenecks. An enterprise OEM must package all this into sellable, maintainable systems compatible with real data centers.
NVIDIA is trying to lead with Vera Rubin NVL72, a full rack architecture combining Vera CPU, Rubin GPU, NVLink, networking, cooling, and modular design. Its advantage is offering a closed platform—well-designed parts working together, reducing manual integration, and leveraging a software ecosystem familiar to AI teams.
However, NVIDIA’s own success fosters resistance. When a company controls critical infrastructure with margins of 70% or more, large customers have incentives to seek secondary sources—not necessarily because the alternative is better, but because having options increases bargaining power.
The likely outcome isn’t an immediate replacement but greater segmentation. Demanding, evolving workloads and time-sensitive tasks will continue to rely on NVIDIA’s platforms. Repetitive, mature, or high-volume workloads might shift to proprietary ASICs if savings are clear. In between, other providers will offer “good enough” options—especially where energy costs and power availability are stricter constraints than raw performance.
For NVIDIA, Vera Rubin will be a key test. If they can demonstrate real improvements in cost per token considering energy, cooling, and full operations, they will strengthen their position before ASICs claim more ground. If the perceived customer advantage is less than the company’s claims, the discussion about margins, dependency, and alternatives will intensify.
AI is entering a less spectacular but more decisive phase: infrastructure economics. The questions for buyers won’t be just “how much performance,” but “what’s the daily operational cost.”
Frequently Asked Questions
What does the Evercore ISI report question about NVIDIA?
It points out that NVIDIA’s claimed TCO advantage doesn’t fully convince some engineers and clients, and that there’s a perception of excessive gross margins.
What is Vera Rubin NVL72?
NVIDIA’s upcoming full-rack AI platform designed for advanced training and inference, incorporating Vera CPUs, Rubin GPUs, memory, networking, and optimized interconnects.
Why are hyperscalers seeking proprietary ASICs?
Because they have enough volume to justify dedicated chips that reduce costs for specific workloads—especially massive inference and repetitive tasks where maximum flexibility of a general GPU isn’t always necessary.
Is NVIDIA at immediate risk from these alternatives?
Not necessarily. NVIDIA maintains a strong advantage in hardware, software, and ecosystem. The risk is more about downward pressure on prices, margins, and reliance on large customers than about an immediate substitution.

