X (Twitter) Facebook Pinterest LinkedIn E-mail

AMD and Intel have taken an uncommon step in an industry accustomed to competing in each processor generation: collaborating on a common extension to accelerate AI workloads within the x86 architecture. The proposal is called ACE, short for AI Compute Extensions, aiming to make matrix acceleration a standard and shared capability for future x86 chips.

The initiative is part of the work by the x86 Ecosystem Advisory Group, a group created by AMD, Intel, and other partners to enhance compatibility and the evolution of x86 in a period marked by pressure from Arm, specialized accelerators, and the growth of Artificial Intelligence. The technical whitepaper on ACE, dated April 15, 2026, presents the extension as a way to improve performance, scalability, and energy efficiency in matrix multiplication operations, which form the basis for training and inference of language models and neural networks.

Why ACE Matters for x86

Matrix multiplication is a core operation in modern Artificial Intelligence. It appears in forward propagation, backpropagation, weight updates, neural network layers, and primitives used in language models. Vector extensions like AVX10 already enable computations of this kind, but AMD and Intel acknowledge in the document that computing density and scalability might fall short for certain workloads.

ACE aims to fill this gap without breaking away from the existing x86 environment. The key is integration with AVX10, rather than establishing a completely separate island. This would allow reuse of work already done by compilers, libraries, runtimes, and existing optimizations, reducing the burden on developers and software vendors.

The proposal relies on outer product operations—a technique that increases computational density compared to conventional vector operations. According to the document, an ACE outer product operation could offer up to 16 times higher density than an equivalent multiply-accumulate operation in AVX10, using the same input vectors.

Technical Aspect	What ACE Contributes
Integration	Works as a natural extension of AVX10
Core Operation	Outer product for matrix acceleration
Compute Density	Up to 16 times higher than an equivalent AVX10 operation
Native Formats	INT8, OCP FP8, OCP MXFP8, OCP MXINT8, BF16
New State	8 tile registers and one block scale register
Target Software	Compilers, debuggers, profilers, HPC libraries, and machine learning frameworks

This approach has a strategic interpretation. AMD and Intel are not trying to turn every x86 CPU into a GPU or dedicated NPU, but they want the general-purpose processor to have better tools to execute relevant parts of AI workloads. This can be beneficial in laptops, workstations, servers, HPC environments, and systems where moving every operation to specialized hardware doesn’t always make sense.

From Laptop to Data Center

One of the most important messages in the whitepaper is scalability. ACE is presented as a matrix acceleration architecture applicable from laptops to data center servers. The idea is for developers to have a more unified base within the x86 world, rather than relying on fragmented solutions or incompatible extensions between manufacturers.

This does not mean ACE will replace GPUs, AI accelerators, or NPUs. The largest training workloads and many inference tasks will continue to utilize specialized hardware. But it can reduce friction in hybrid operations, preprocessing, auxiliary kernels, local inference, numerical libraries, or parts of models where the CPU still plays an important role.

The document also explains that ACE presents itself to software as a new “palette” within the AMX framework, allowing reuse of parts of the system programming model and related OS support. This technical choice is important as it lowers the barrier for adoption in low-level software.

In terms of data formats, ACE adopts popular AI formats including INT8, BF16, and OCP MX formats. Support for OCP MX is particularly interesting because it includes inline block scaling—a technique designed for low-precision formats that helps reduce bandwidth and memory usage without sacrificing much utility in modern models.

Low precision has become essential for making AI more efficient. Operating with high-precision formats isn’t always necessary if models can maintain acceptable quality using INT8, FP8, BF16, or other compact formats. ACE recognizes this and incorporates conversion and packing mechanisms for narrower data, including 2-7 bit formats via the VUNPACKB instruction.

A Technical Alliance with Political Overtones

The fact that AMD and Intel collaborate on such an extension is no small detail. The two companies compete in desktop, laptop, server, and workstation CPUs, yet share a common goal: keeping x86 an attractive platform for AI development.

For years, x86’s strength was in its compatibility, large installed base, and mature software ecosystem. Now, this value must coexist with new demands: energy efficiency, model acceleration, support for low-precision formats, and the ability to operate in increasingly heterogeneous workloads. ACE aims to address these from within the architecture itself.

Standardization can be a clear advantage. If AMD and Intel implement compatible capabilities, developers will have fewer reasons to optimize differently for each provider. In theory, a machine learning library, a framework like PyTorch or TensorFlow, or a scientific library like NumPy or SciPy could benefit from shared acceleration paths in the future.

The whitepaper itself mentions that the software enablement work is already underway, with initial integration in compilers, debuggers, and profilers. Upcoming efforts will focus on optimized kernels, deep learning and HPC libraries, primitives for language models, and machine learning runtimes.

What Remains to Be Seen

ACE is still a technical proposal, not a guarantee of performance in specific commercial products. Key details are missing: which processor generations will implement it, actual performance levels, differences between AMD and Intel, how operating systems will support it, and how quickly popular libraries will adopt it reliably.

It’s also important to see how ACE will fit into the broader hardware ecosystem. Laptops already combine CPUs, GPUs, NPUs, and multimedia accelerators. Servers mix CPUs with GPUs, FPGAs, SmartNICs, and specialized accelerators. In this landscape, the CPU needs to enhance its matrix capabilities without redundant overlaps with other components’ strengths.

The opportunity lies in intermediate workloads: moderate local inference, auxiliary operations, scientific loads, data preprocessing, smaller models, enterprise automation, and applications where transferring data to another accelerator incurs more penalty than benefit. If ACE reduces these frictions, it can strengthen the role of x86 in a significant segment of daily AI work.

The challenge will be adoption. An instruction extension only influences the market when it appears in real processors, is well supported by operating systems, and becomes invisible to developers through mature libraries. AVX10 was an early attempt to shape the future vector capabilities of x86. ACE adds the missing matrix piece, ensuring the architecture doesn’t rely solely on external accelerators in AI discussions.

AMD and Intel understand that collaboration doesn’t eliminate competition. They will continue to differentiate in design, clock speed, power consumption, caches, manufacturing nodes, packaging, and platforms. But if ACE succeeds, both could gain something more significant: maintaining x86 as a comfortable architecture for AI software development in an evolving market that no longer takes familiarity for granted.

Frequently Asked Questions

What is ACE in x86 processors?
ACE, or AI Compute Extensions, is a proposed extension for x86 developed by AMD and Intel to accelerate matrix multiplication operations used in AI workloads.

Does ACE replace a GPU or NPU?
No. ACE aims to improve the matrix capabilities of the x86 CPU, but GPUs, NPUs, and dedicated accelerators will still be important for large training and inference workloads.

What is ACE’s relationship with AVX10?
ACE integrates with AVX10 and reuses vector registers as inputs for matrix operations. The goal is to expand x86’s capability without disrupting existing software models.

Which data formats does ACE support?
The technical document mentions native support for INT8, OCP FP8, OCP MXFP8, OCP MXINT8, and BF16, formats relevant for AI workloads and low-precision computations.

via: X Twitter

X (Twitter) Facebook Pinterest LinkedIn E-mail