X (Twitter) Facebook Pinterest LinkedIn E-mail

The agreement between Anthropic and SpaceX to utilize the full capacity of Colossus 1 initially seems like a contradiction that’s hard to explain. Elon Musk has criticized some of his AI rivals for years, and xAI competes directly with Anthropic in the frontier model race. However, this move makes much more industrial sense when viewed through infrastructure: Colossus 1 may be less attractive as a training cluster for xAI, but it is highly valuable as an inference platform for Claude.

Anthropic has confirmed an agreement with SpaceX to utilize the entire computing capacity of the Colossus 1 data center. According to the company, this will provide more than 300 MW of new capacity and over 220,000 NVIDIA GPUs available within a month. The immediate goal is to increase Claude Code usage limits, remove peak-hour restrictions for Pro and Max plans, and raise API limits for Claude Opus models.

A huge cluster, but not necessarily ideal for training

Colossus 1 is one of the largest known AI clusters. xAI describes it as an infrastructure with over 220,000 NVIDIA GPUs, including H100, H200, and GB200 models. The mix of generations is significant. For many use cases, this provides a substantial amount of available capacity. However, for distributed frontier model training, a heterogeneous architecture can introduce problems.

Large-scale training requires very fine synchronization. Thousands or tens of thousands of GPUs must progress together at every step. If some parts of the cluster run faster than others, the more powerful GPUs have to wait for the slower ones. This phenomenon, known as the straggler effect, reduces actual system utilization. Recently, the most discussed figure is an alleged 11% Model FLOPs Utilization (MFU), attributed to xAI by The Information and reported by Data Center Dynamics. This percentage is well below the over 40% levels typically cited for other major labs.

This data should be treated with caution, as it does not come from a comprehensive public technical audit. Still, it aligns with a known challenge: training enormous models involves more than just owning GPUs. It requires a finely tuned stack of software, networking, topology, scheduling, cooling, power, and debugging. A cluster of 220,000 GPUs can be impressive in raw numbers but difficult to leverage effectively if hardware characteristics vary significantly.

The official statement from xAI notes that Colossus 1 is designed for training, fine-tuning, inference, and high-performance computing, but Elon Musk also mentioned on X, according to Reuters, that SpaceX had shifted its AI training efforts to Colossus 2. This detail is key: if xAI has already moved its main training to another cluster, Colossus 1 ceases to be the hub of their core R&D and can become a profitable asset.

Inference doesn’t require the same synchronization as training

The difference between training and inference explains much of this agreement. Frontier model training demands that vast numbers of GPUs work in perfect sync for weeks or months. Inference, on the other hand, can split many requests across GPU groups with more flexibility. While it doesn’t eliminate all challenges, inference can better tolerate heterogeneity.

For Anthropic, which is seeing increased use of Claude Code and its Opus models, the urgency isn’t necessarily to train the next model on Colossus 1. It’s more about serving more users, more coding sessions, more API requests, and higher enterprise load. In this context, an additional 220,000 GPUs can quickly translate into useful capacity, even if the cluster isn’t ideal for extreme distributed training.

This also explains why Anthropic can leverage an asset that for xAI wasn’t originally perfect for training. A mixed cluster of H100, H200, and GB200 may face penalties under workloads requiring global synchronization, but it can be much more cost-effective when used for inference, Claude services, agent execution, user queues, and API capacity.

Additionally, Anthropic accesses all this capacity as a single major client. This reduces typical multitenancy problems like unpredictable latency, load interference, and fragmented business management. From SpaceX/xAI’s perspective, it simplifies operations: one big contract, a clear load, and intensive use of an asset already in place.

Musk’s financial move

This deal also has a financial angle. Reuters notes that the pact provides SpaceX with a top-tier client at a time when the company is preparing for an IPO and aims to reassure investors about its AI ambitions. The computing infrastructure transitions from being solely a massive cost to train Grok to a line of revenue-generating business.

This is a crucial point. An AI lab burning billions annually on training models faces a complex financial narrative. A company capable of renting data center capacity to third parties and generating recurring revenue resembles more of an infrastructure platform. Notably, some analysts already speak of a “neo-cloud” model: large cluster owners leasing capacity to labs, startups, and enterprises that cannot build such data centers themselves.

Exact profitability figures for the contract haven’t been made public. Market estimates suggest potential revenues of several billion dollars per year based on high GPU-hour prices, but these are based on assumptions—actual GPU rates, utilization levels, contract duration, energy costs, depreciation, maintenance, network, and staffing. It’s prudent to view these as scenarios rather than definitive data.

Nevertheless, the asset’s nature clearly shifts. Colossus 1 might have been a problematic cluster for frontier training if its effective utilization was low. By leasing it to Anthropic for inference and product-capacity, SpaceX/xAI turns that infrastructure into a steady cash flow. The same GPU farm that once seemed a logistical challenge now becomes a business asset.

Anthropic gains time and capacity

For Anthropic, this agreement solves an equally pressing problem. The company needs capacity to sustain the growth of Claude, especially Claude Code. In its announcement, Anthropic explained that this deal complements other compute commitments: up to 5 GW with Amazon, a similar 5 GW deal with Google, with Broadcom starting in 2027, $30 billion of capacity in Azure via Microsoft and NVIDIA, and a $50 billion investment in U.S.-based infrastructure through Fluidstack.

The message is clear: Anthropic aims to avoid dependence on a single provider or hardware type. It trains and runs Claude on AWS Trainium, Google TPU, and NVIDIA GPUs. This diversification has become a strategic necessity for any frontier AI lab. Increasing user demand, usage limits impacting user experience, and available compute capacity all play a role in determining who can sell more.

The SpaceX deal also delivers an immediate commercial benefit: higher limits for Claude Code and expanded capacity for paying customers. It’s not just a long-term infrastructure promise; it’s a tangible upgrade that addresses current usage restrictions for clients.

Orbital AI: the most futuristic part of the deal

The package includes another striking element: Anthropic has expressed interest in collaborating with SpaceX to develop multiple gigawatts of orbital compute. The idea of data centers in space sounds extreme but addresses a very terrestrial issue: energy, land, cooling, and permits are becoming physical barriers to AI expansion.

Reuters reports that Anthropic sees potential in this area and that SpaceX aims to position orbital compute as a major future narrative. In their statement, xAI mentions that SpaceX is among the few organizations with enough launch cadence, orbital mass economics, and constellation expertise to move space computing from research concept to engineering program.

In the short term, however, the real business is on Earth, not in orbit. Colossus 1 provides immediate capacity to Anthropic and allows SpaceX/xAI to demonstrate that their data centers can produce revenue beyond internal use. The orbital aspect reinforces long-term ambitions, but terrestrial infrastructure remains the core of this deal.

This operation highlights a key lesson about the new AI economy: success no longer depends solely on having the best model. It depends on access to energy, GPUs, networking, cooling, training software, inference capacity, and paying customers. In this landscape, an imperfect cluster can be a poor asset for some tasks but an excellent one for others.

Musk hasn’t simply handed Colossus 1 to a rival. Instead, he has transformed a possibly less efficient training cluster into a source of inference capacity—exactly where Anthropic needed a boost. Meanwhile, xAI continues focusing on Colossus 2 for training new models, while SpaceX monetizes Colossus 1 with a top-tier client. This is asset rotation, not surrender.

FAQs

What has Anthropic agreed with SpaceX?
Anthropic will use the entire capacity of Colossus 1, with over 300 MW and more than 220,000 NVIDIA GPUs available within a month, to expand Claude’s capacity.

Why does xAI transfer capacity to a competitor?
Because, according to Elon Musk, SpaceX has already moved its training efforts to Colossus 2. Colossus 1 can be more profitable as rented inference capacity than as a primary training cluster.

Why is a mixed cluster problematic for AI training?
Distributed training requires many GPUs to stay synchronized. If some are faster or if network delays occur, the more powerful GPUs may wait, reducing real utilization.

Why can Anthropic still make use of Colossus 1?
Because inference tolerates heterogeneity better than training. Many requests can be distributed among GPU groups, making a mixed cluster more useful for serving users and APIs.

Why did xAI hand over a 220,000-GPU cluster to Anthropic?
The technical backdrop to xAI’s decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of… https://t.co/dE9O3RZr4B
— Jukan (@jukan05) May 9, 2026

X (Twitter) Facebook Pinterest LinkedIn E-mail