Apple Makes a Move in AI: Own Server Chips and Tactical Alliance with Gemini

Apple is trying to solve two equations simultaneously that almost always clash in the world of artificial intelligence: advancing quickly without losing control. In the short term, the company has accepted an uncomfortable reality for anyone aiming to dominate their own platform: For Siri and Apple Intelligence to make a credible leap, it may be necessary — at least temporarily — to rely on cutting-edge models from third parties. At the same time, Apple is accelerating its strongest suit: integrating hardware, software, and services with proprietary technology, this time with a clear goal behind the AI scenes: server chips for inference.

The most visible sign of this hybrid strategy came with the announcement of a collaboration between Apple and Google. In a joint statement released in 2024, both companies indicated that “the next generation of Apple Foundation Models will be based on Google’s Gemini models and cloud technology”, also emphasizing privacy commitments such as user data not being used for training the models. This move was interpreted as a “bridge” to sustain the evolution of experiences like Siri while Apple matures its own AI stack.

However, Apple does not seem to want to remain permanently dependent. The company has shown in other transitions — from Intel to Apple Silicon, or on its way toward 5G — that it prefers to “lease” solutions only for the necessary time. In AI, the incentive is even greater: the model and experience are no longer just a layer of the product but the driving force that defines what the device can do, how it integrates with the operating system, and the value the user perceives.

This is where the second move fits in: Apple-designed server chips for AI. Reuters reported that Apple was working with Broadcom on an AI server chip, codenamed “Baltra”, with plans to start production in 2026. The goal would not be to replace processing on the iPhone or Mac but to strengthen backend inference: responding to requests, running models, filtering and organizing information, and handling peak demand with more predictable costs and power consumption.

The choice of Broadcom is not incidental. In the data center ecosystem, Broadcom has significant influence in interconnection, networking, and specialized silicon. For Apple, partnering with a company of that profile can shorten the path from idea to deployment, especially when sector bottlenecks are no longer just in GPUs: energy, cooling, advanced packaging, performance per watt, and logistics to scale capabilities without skyrocketing costs are also critical.

This custom silicon plan aligns with another piece Apple has already put on the table: “Private Cloud Compute” infrastructure as part of Apple Intelligence. The company advocates that certain AI tasks should run on the device, and only when necessary should they scale to the cloud under a more controlled approach. Along these lines, various announcements about industrial investment in the U.S. included plans to manufacture servers in Houston for this private computing layer, with shipments scheduled to begin in 2026. The message is clear: Apple wants the “cloud” part of its AI to resemble, in philosophy, its “on-device” part — integrated, audited, and with proprietary design.

In the market, this shift has implications beyond Apple. If “Baltra-type” chips thrive, it reinforces a trend that is already redefining the sector: big platforms want to control the unit cost of each AI response, including power consumption per inference and end-to-end latency. It’s not just a race of models; it’s a race for infrastructure.

There’s also a reputational factor: when a company promises a “native” AI experience, the pressure to deliver is enormous. By 2026, the standard will no longer be “it works,” but “it works quickly, with good context, useful responses, and privacy guarantees.” That’s why Apple appears to be taking a pragmatic approach: a tactical alliance to buy time, and vertical integration to avoid becoming trapped.

In summary: Apple is playing at two speeds. On the interface, it seeks to accelerate capabilities with the support of Gemini. At the base, it is laying the groundwork for scalable, sustainable AI: servers, efficiency, cost control, and proprietary silicon. If the plan succeeds, “Apple AI” will cease to be just a feature and become a full platform, sharing the same DNA that made Apple Silicon possible.


Frequently Asked Questions (FAQ)

What is an inference server chip for AI, and why does it matter?
It’s the processor that runs trained models to generate responses (infer). It’s important because it determines query costs, latency, power consumption, and the ability to scale to users.

What does it mean that Apple uses Gemini and is also developing its own chips?
It means trying to gain speed in the short term (cutting-edge models) without sacrificing control in the medium term (infrastructure and proprietary silicon), thereby reducing technological dependency and optimizing costs.

How does “on-device” AI differ from cloud AI in Apple Intelligence?
On-device runs directly on the device (more privacy, less latency). Cloud is used for heavier tasks, with Apple planning to do so via a layer of private compute to maintain security and privacy standards.

When might the “Baltra” server chips go into production?
According to Reuters reports on the project, the target was 2026, though silicon timelines can shift due to manufacturing capacity and product priorities.

Scroll to Top