OpenAI no longer wants to just train models, sell API access, or turn ChatGPT into a productivity platform. With Jalapeño, its first Intelligence Processor designed in partnership with Broadcom, the company is diving straight into the most physical layer of artificial intelligence: silicon, networking, racks, and data center energy efficiency.
This announcement marks a significant shift in OpenAI’s strategy. The company introduces Jalapeño as an accelerator built from scratch for language model inference, not as a general-purpose GPU repurposed for AI. Their goal is clear: deliver large models faster, more reliably, and with better performance per watt. In an industry where every efficiency point can translate into millions of dollars in savings, this difference is extremely important.
Inference has become one of AI’s primary battlegrounds. Training cutting-edge models remains very costly, but the real recurrent expense arises when these models are used daily by millions of people and businesses. Every ChatGPT response, every Codex task, each API call, and every future agent executing actions for minutes consumes compute power, memory, bandwidth, and energy.
Jalapeño targets exactly this issue. OpenAI isn’t just saying it will develop its own chip; it’s saying it aims to design an infrastructure around how its models operate—kernels, serving systems, and actual products.
An ASIC for AI already in production
The difference between building a general-purpose chip and designing an ASIC for a specific workload is fundamental. A GPU must serve many purposes: training, inference, HPC, graphics, simulation, or scientific analysis. A dedicated accelerator can sacrifice some flexibility to significantly improve performance on particular tasks.
OpenAI knows these tasks better than almost anyone. It handles massive workloads in ChatGPT, Codex, and its API. It understands where memory is consumed, what attention patterns appear in its models, what latencies users tolerate, what kernels are repeated, and where efficiency drops when serving AI at scale.
This knowledge is what it seeks to embed into hardware. According to OpenAI, Jalapeño reduces data movement and balances compute, memory, and networking to bring the actual chip utilization closer to its theoretical performance. This is crucial because many accelerators promise high figures on paper but fall short in production due to memory bottlenecks, interconnection issues, or software limitations.
| Stack Layer | What OpenAI Aims to Control |
|---|---|
| Models | Architecture, training, and evolution of LLMs |
| Products | ChatGPT, Codex, API, and future agents |
| Serving | Scheduling, latency, scaling, and costs |
| Kernels | Critical inference operations |
| Memory | Reduced data movement and better utilization |
| Network | Scalable interconnection with Broadcom |
| Hardware | Own accelerators for AI workloads |
| Racks | Industrial integration with partners like Celestica |
The chip does not arrive isolated. Broadcom provides silicon implementation, connectivity, and networking technologies like Tomahawk. Celestica participates in integrating boards, racks, and systems. The industry’s message is clear: OpenAI isn’t designing just a component but a multi-generational computing platform.
Why Broadcom is the logical partner
Broadcom’s role makes perfect sense. The company has become one of the most significant players in custom ASICs for large tech clients. Its role isn’t exactly to compete with NVIDIA in the same category of universal GPUs but to help high-demand companies create specialized accelerators, interconnected and scalable.
For OpenAI, this provides a different pathway than relying solely on commercial GPUs. It doesn’t mean abandoning NVIDIA or replacing all existing infrastructure overnight. Most likely, GPUs, proprietary accelerators, and third-party chips will coexist for years. But it allows OpenAI to start moving specific workloads to hardware designed for their inference patterns.
Network interconnection is another critical aspect. For large-scale AI, performance depends not just on the chip but on thousands of accelerators communicating with low latency and high bandwidth. As models grow, agents perform more steps, and users demand faster responses, internal data center interconnects become part of the product. Broadcom holds a strong position here.
Hence, Jalapeño should be seen as both a chip and a system. Accelerator, network, board, rack, power, and serving software form an integrated operational unit. This is what differentiates today’s AI chip race from earlier cycles: the winner won’t just be whoever has the most TOPS or bandwidth but who can operate the entire system at a lower cost per token.
AI begins designing AI infrastructure
One of the most striking details of the announcement is the development timeline. OpenAI claims Jalapeño went from initial design to tape-out in nine months, aided by its own models to accelerate design and optimization phases.
This figure should be approached with caution, as moving from tape-out to mass deployment involves many more steps. Nevertheless, it’s noteworthy. High-performance semiconductor design is usually a lengthy, expensive process with many validation phases. If AI models can genuinely assist in verification, documentation, design exploration, or error review, the hardware development cycle could significantly shorten.
A kind of industrial feedback loop emerges: current models help design chips that will serve future models. This isn’t a minor idea. Accelerating this cycle allows companies controlling more of the stack to progress faster than those relying on external suppliers for each hardware decision.
OpenAI already uses AI to write code, analyze data, and assist developers. Extending this to chip design fits within its vertical integration strategy. The goal isn’t just better models but a more efficient compute factory for future models.
The real battleground: cost per token
The most crucial missing technical data is how much Jalapeño reduces actual inference costs. OpenAI reports initial performance per watt that far exceeds current state-of-the-art, but final numbers, comparable benchmarks, memory details, manufacturing process, power consumption, bandwidth, rack costs, or cost per token haven’t been published yet.
Until these data become available, Jalapeño should be viewed as a strategic promise rather than a proven technical victory. Chip labs are just the first step. Production, reliability, sustained performance, supply, software, data center operation, and scalable deployment without degrading user experience are future challenges.
Nonetheless, the move makes perfect sense. If OpenAI can lower inference costs, it can deliver faster responses, cheaper plans, more availability during peak hours, and agents capable of longer tasks without tripling costs. This directly influences their business model.
| Key Metric | Why It Matters |
|---|---|
| Performance per watt | Reduces energy costs and necessary density |
| Latency | Enhances experience in interactive products |
| Throughput | Enables serving more users per infrastructure |
| Cost per token | Sets API pricing and margins |
| Reliability | Prevents bottlenecks under high demand |
| Scalability | Determines if deployment at gigawatt scale is feasible |
The cost per token will become one of the decisive metrics in the next phase of AI. Models will be more capable but also more widely used. Agents, coding tools, enterprise copilots, research assistants, and multimodal products will increasingly depend on inference. The entity that best controls this cost will have a significant competitive advantage.
The new chip race is no longer just NVIDIA’s
NVIDIA will remain the dominant player in AI accelerators for the near future, especially in training and mature software platforms. But the market is fragmenting. Google has TPUs, Amazon invests in Trainium and Inferentia, Microsoft develops Maia, Meta works on its own chips, and now OpenAI is partnering with Broadcom for specialized hardware.
The reason is simple: the largest AI consumers no longer want to rely solely on a one-size-fits-all tool. When volume is huge, designing a tailored tool can be worthwhile. When compute costs influence product economics, hardware shifts from a technical purchase to a strategic business decision.
OpenAI joins the ranks of hyperscalers, but with a key difference. It isn’t a traditional general cloud provider. Its main workload is AI as a product. This could make Jalapeño more specialized than other internal cloud chips.
The question remains whether this specialization will be enough. An in-house chip can be highly efficient for specific workloads but may fall short if model architectures change, multimodal inference grows unexpectedly, or market demands flexibility. OpenAI states Jalapeño is designed for current and future LLMs, but only real deployment will prove the extent of that flexibility.
A step further toward OpenAI’s industrialization
Jalapeño shouldn’t be seen merely as hardware curiosity. It signals that OpenAI is evolving into an industrial AI company. Models, products, data centers, chips, energy agreements, cloud partnerships, and deployment capacity are starting to form a unified strategy.
This has implications across the industry. For chip vendors, it means their top clients want to negotiate from a position of strength. For data centers, it confirms that AI demand will keep pushing energy, cooling, and networking needs. For API users, it opens the possibility that inference might become cheaper and more stable if the chip performs as promised. For competitors, it raises the bar on vertical integration.
It also raises a fundamental question: if the most advanced AI increasingly depends on gigawatt-scale infrastructure, competition will no longer be decided solely in research labs but across supply chains, energy availability, chip design, data center networks, and financial capacity.
Jalapeño is OpenAI’s first chip, but it won’t be the last if its strategy succeeds. The real announcement isn’t a specific processor but the start of a proprietary computing platform that could redefine how the company delivers models worldwide.
The next AI battle won’t only be fought in benchmarks. It will be won in watts, racks, latency, and cost per token.
Frequently Asked Questions
What is Jalapeño?
Jalapeño is OpenAI’s first inference chip developed with Broadcom. The company describes it as its inaugural Intelligence Processor.
What is an inference chip used for?
It’s used to run trained models during user interactions in products like ChatGPT, Codex, or the API. Its goal is to reduce latency, cost, and energy consumption.
Will it replace NVIDIA GPUs?
Not necessarily. It will likely coexist with GPUs and other accelerators. Jalapeño is designed for specific LLM inference workloads, not as a universal solution.
When will it be deployed?
OpenAI plans an initial rollout by late 2026 with subsequent expansion within a multi-generational platform alongside Broadcom, Celestica, and data center partners.
Sources:
OpenAI, “OpenAI and Broadcom unveil LLM-optimized inference chip”.
OpenAI and Broadcom, strategic collaboration announcement for AI accelerators.

