FuriosaAI has announced a strategic partnership with Broadcom to develop its third-generation AI accelerators, a platform specifically designed for large-scale inference and agentic workloads. The South Korean company aims to deviate from the traditional path of general-purpose GPUs, instead investing in a chiplet architecture with 2-nanometer compute dies, HBM4/HBM4E memory, and Broadcom’s networking technologies to scale within massive data centers.
The announcement comes at a time when the industry is shifting focus beyond model training. The upcoming phase will be characterized by massive inference: millions of users, AI agents performing tasks, multimodal models, continuous token generation, and the need to reduce energy costs. In this scenario, the winning chip isn’t necessarily the one with the most raw power, but the one that moves data more efficiently, consumes less energy, and delivers more tokens per watt.
FuriosaAI states that its new platform will be tailored for such data centers, which the company describes as the era of “token factories.” Sampling of the new accelerator is scheduled for the first half of 2028, so there is still time before it enters actual production.
A chiplet architecture with HBM4 to move more data
The third generation of FuriosaAI will utilize a multi-die design based on chiplets, with a 2-nanometer compute component and HBM4/HBM4E memory. According to the company, advanced packaging from Broadcom will enable the integration of multiple silicon blocks into a high-performance system aimed at inference workloads.
Memory is a key element of the design. In current models, especially for inference of large language models, the bottleneck isn’t just processing, but feeding the chip data at sufficient speeds. The HBM4 and its evolution HBM4E promise higher bandwidth, which is essential for handling large models, reducing latency, and increasing tokens generated per energy unit.
FuriosaAI emphasizes that its architecture focuses on the efficient movement of data, rather than the thread management typical of traditional GPUs. The company asserts that this approach will deliver higher performance per watt and greater token density than the most efficient GPUs on the market. This is an ambitious claim, and it will need to be validated once real chips, independent benchmarks, and production deployments are available.
The design will also incorporate Broadcom’s Ethernet and PCIe technologies, aiming to connect accelerators within large clusters. This is significant because AI data centers are no longer built chip-to-chip, but rack-to-rack and cluster-to-cluster. Network latency, internal communication, and node interconnects are as critical as the performance of the accelerators themselves.
From RNGD to a platform for hyperscalers
The new platform builds on the experience of RNGD, FuriosaAI’s second-generation chips, currently in mass production using TSMC’s 5-nanometer process. RNGD is a 180-watt PCIe accelerator designed for inference of language models, multimodal workloads, and agentic AI applications.
The current FuriosaAI product features 48 GB of HBM3 memory, offers 1.5 TB/s of memory bandwidth, and is intended for air-cooled data centers. The company positions it as an efficient option for deploying advanced models without the infrastructure demands of some high-end GPUs.
Notable clients and validations mentioned by FuriosaAI include Samsung SDS and LG AI Research. This support is notable because while the AI accelerator market is filled with promises, few players have moved beyond technical demonstrations to real customer deployments and scaled production.
The partnership with Broadcom elevates the project significantly. Broadcom brings expertise in ASICs, advanced packaging, and a strong position in data center networking, high-bandwidth Ethernet switches, and customized XPU platforms for large clients. For FuriosaAI, this collaboration could be the key to evolving from selling efficient inference chips to competing as an infrastructure platform for large-scale deployments.
Inference creates space for alternatives to NVIDIA
The AI chip market remains dominated by NVIDIA, especially in training and large-scale GPU deployments. However, inference is opening opportunities for more specialized architectures. As models are used continuously in production, cost per token, energy efficiency, and latency become critical factors.
Companies like FuriosaAI, Cerebras, Groq, Tenstorrent, and various in-house designs from hyperscalers are responding to this need. Not all will compete in the same space, but all aim to reduce dependence on general-purpose GPUs where a tailored architecture can perform better for specific workloads.
FuriosaAI’s approach makes sense within this context. For a data center that needs to generate tokens continuously, serve AI agents, handle multiple requests simultaneously, and keep costs under control, an inference-optimized solution could be very attractive. Yet, the challenge is formidable: software ecosystems, model compatibility, developer tools, reliability, HBM memory supply, packaging, 2 nm manufacturing, and competing with highly mature ecosystems.
The company aims to address part of this challenge with its software stack. FuriosaAI claims its SDK enables deploying models from PyTorch through a general compiler, without relying on extensive, hand-tuned kernel libraries for each model. It also offers a virtual ISA for developers who need more hardware control without the complexity of traditional GPU programming.
The schedule forecasts sampling in 2028, aligning with the next wave of AI data centers. By then, pressure on energy, memory, networks, and cost per token will be even greater. If FuriosaAI and Broadcom deliver as promised, their solution could become a serious alternative for large-scale inference. Otherwise, it will be just another architecture among many attempting to challenge GPU dominance during the most competitive era in silicon history.
Frequently Asked Questions
What have FuriosaAI and Broadcom announced?
They announced a partnership to develop FuriosaAI’s third-generation AI accelerators, based on chiplets, 2 nm compute, HBM4/HBM4E memory, and Broadcom networking technologies.
What workloads is this chip intended for?
It’s aimed at large-scale AI inference, language models, agentic workloads, post-training sampling, and massive token generation in data centers.
When will the new accelerator be available?
FuriosaAI plans to start sampling the chip in the first half of 2028, though commercial availability will depend on development progress and initial customer adoption.

