X (Twitter) Facebook Pinterest LinkedIn E-mail

Arm has unveiled its new Lumex platform as part of the Compute Subsystems (CSS) strategy: a “near-ready-to-manufacture” package combining Arm C1 CPU (Armv9.3 with SME2 integrated), Mali G1-Ultra GPU with Ray Tracing Unit v2, optimized system interconnection and memory, unified telemetry, and a software stack designed to accelerate development. The clear declared goal: democratizing on-device AI — from premium smartphones to wearables — and reducing reliance on the cloud. The company highlights reference implementations on advanced nodes, including TSMC’s 3 nm.

The proposal comes amid competitive pressures: the C1 cores aim to challenge x86 in performance per watt for client and edge computing, and the Mali G1-Ultra seeks to close the gap with high-end mobile GPUs, scaling ray tracing and AI tasks. Simultaneously, Arm promotes its “on-device first” narrative with SME2 (Scalable Matrix Extensions 2), which accelerates matrix operations typical of transformers and CNNs directly on the CPU, promising up to 5× AI performance and 3× efficiency boost over previous generations.

Below is a technical overview — with figures — of what was presented.

Lumex CSS: CPU, GPU, and system orchestrated for on-device AI

Lumex CSS is not a single IP block but a “subsystem” that Arm delivers to SoC manufacturers with CPU C1 clusters, Mali G1 GPU, system interconnect (SI), optimized MMU, Kleidi AI, and tools for telemetry and profiling. The key, according to the company, is reducing design cycles and enabling each partner to mix C1-Ultra, C1-Premium, C1-Pro, and C1-Nano cores depending on their product (flagship, mainstream, efficiency, wearables).

Armv9.3 + SME2: Native matrix instruction integration to accelerate inferences on the CPU (attention, convolutions, linear projections). Arm positions SME2 as a “game-changer” for private, real-time AI experiences, with fewer offloads to NPU/GPU when the “hot path” fits in CPU cache and memory.
Software Stack: References to Kleidi AI and unified tools for performance, aimed at enabling developers and OEMs to port compact LLMs, TTS, SR, or super-resolution without redesigning the SoC.

Introducing Arm Lumex CSS Platform: A Scalable AI Platform for Mobile and PC

Arm C1 CPU: four variants from flagship to wearable

The new C1 cores replace their Cortex counterparts in the roadmap and come in four flavors with Armv9.3 and SME2:

C1-Ultra: “Big” core for maximum performance, designed for large AI models, computational photography, and high-end content generation. According to shared data, it offers +25% per-core performance over its predecessor, a 25% wider execution window, and +33% L1 bandwidth compared to Cortex-X925.
C1-Premium: Strives for a balance of performance/area with near-Ultra performance but in ≈35% less silicon area, targeting “mainstream” and upper-range smartphones.
C1-Pro: Energy-efficient profile for video playback, background inferences, and continuous loads. Performance gains around +16% over the previous generation.
C1-Nano: The ultra-efficient option for wearables; reduces power consumption by ≈26% and minimizes area to fit into watch and band designs.

Beyond marketing labels, the technical message is that SME2 shifts some lightweight AI workloads onto the CPU with lower energy penalties and more stable latencies, leaving GPU/NPU for bursts and larger batches. The launch materials mention up to 5× improvements in AI workloads and significantly lower latencies (voice) compared to previous generations, reinforcing the focus on always-on experiences without cloud dependence.

Manufacturing & “time-to-market”. Arm states that Lumex CSS is optimized for 3 nm (e.g., TSMC N3), directly targeting 2025-2026 premium smartphones. The CSS format (nearly ready subsystem) aims to shorten integration cycles for partners who prefer not to start from scratch with individual IPs.

Mali G1-Ultra GPU: 2× Ray Tracing and +20% in Gaming & AI

The new Mali G1-Ultra replaces the Immortalis-G925 as Arm’s top-tier GPU. Its star feature is the Ray Tracing Unit v2 (RTUv2), which doubles ray tracing performance over the previous generation and improves around +20% in graphics benchmarks and AI inferences thanks to FP16 matrix paths and scheduler modifications.

In real workloads, Arm cites impacts in titles like Fortnite, Genshin Impact, Arena Breakout, and Honkai: Star Rail, showing higher fps and greater efficiency per frame. For industry, perhaps the most relevant is that these advances “trickle down” to the G1-Premium and G1-Pro variants: not only flagship models improve, but also the mid-range segment.

Under the hood. The architecture includes dual-pipeline shader cores, more fast-access registers, intelligent dependency region (IRD) to smooth execution bubbles, per-tile counters integrated with Vulkan, and a roadmap for RenderDoc. Additionally, Arm extends its temporal rescaling ASR (Accuracy Super Resolution), already present in Unreal Engine 5 and Fortnite on mobile. The state of the art aligns with the trend: “console-quality” visuals on pocket-sized screens, but with much stricter thermal and power limits.

Why does this generation matter? A tech-savvy perspective from sysadmins and developers

Beyond the press release, there are key technical and product implications to highlight:

CPU as a “serious” AI accelerator. With SME2, Arm doesn’t aim to replace the NPU but to expand the space where CPU offers lower latency and cache coherence to support hybrid pipelines (e.g., ASR/TTS or pre/post-processing stages of an LLM). For apps with small, frequent inferences, the C1 CPU could be the “default engine,” reserving NPU bursts.
Mobile ray tracing, now “playable”. The doubling of RT performance and 20% improvement doesn’t mean “full” ray tracing in every title, but it raises the threshold for selective effects (reflections, shadows) without sacrificing 60 fps, especially with ASR and temporal techniques. For developers, this opens doors for RT presets on mobile without overhauling their render pipeline.
Reduced integration friction for OEMs. The CSS format accelerates market entry: pre-validated C1 clusters, interconnect and MMU optimizations, unified telemetry, and Kleidi AI for connecting CPU/GPU/NPU. The result: shorter cycles and more SKUs per year with better “mix & match”.
Competition with x86 on high performance, with RISC-V on low end. In performance per watt, the C1-Ultra + SME2 + 3 nm trident puts Arm in a strong position against x86 for mobile client and “edge-client” workloads. Meanwhile, the CSS standardization competes with the cost-optimized RISC-V ecosystem. The battlefield: AI latency and cost/area.

Declared performance and comparable metrics

Arm shared several key metrics — for both CPU and GPU — that illustrate the magnitude of the leap:

CPU C1 (with SME2): up to 5× in AI loads and ≈3× the energy efficiency over previous generation, thanks to matrix instructions, expanded execution window, and improved L1 cache (for Ultra, +33% bandwidth).
Mali G1-Ultra GPU: 2× in ray tracing (RTUv2) and ≈+20% in graphics benchmarks and AI inference compared to Immortalis-G925; with notable gains in popular titles such as Fortnite, Genshin Impact, Arena Breakout, and Honkai, along with development tools like Vulkan counters and RenderDoc support in the pipeline.

Note: these figures depend on configurations, clocks, and effective TDP of each SoC/OEM, so final product results may vary — as always — based on thermal and power design choices.

Timeline and expected adoption

Industry sources expect commercial launches starting late 2025, aligned with refresh cycles of Qualcomm, MediaTek, Samsung, or HiSilicon. Arm showcased Lumex in an event in China, emphasizing the relevance of the Android ecosystem outside Apple for the initial wave of adoption. Whether we’ll see volume silicon on 3 nm with C1/G1 before the holiday season depends largely on tape-out schedules and fab capacity, especially on N3.

Impact for users and industry

End user: more private AI features (real-time translation, contextual assistants that don’t send data to cloud, advanced photo editing in the gallery) and mobile gaming with lighting effects and reflections typical of PC/consoles, with improved battery life over previous attempts.
Developers: SME2 as an additional optimization target, access to unified system telemetry, per-tile counters in GPU, and ASR as a basis for temporal rescaling. The challenge: adapting engines (Unity/Unreal) and AI middleware for cooperative CPU/GPU/NPU workflows without “over-scheduling”.
OEMs: less ad-hoc integration, greater predictability in timelines and BOM, and differentiation through mixed cores (Ultra/Premium/Pro/Nano) and clocks, rather than reinventing the wheel with interconnect and memory management.

And how does this compare to x86?

The inevitable question: Is this a real threat to x86 in client devices? For latency-sensitive and performance-per-Watt workloads, yes: C1-Ultra with SME2 could compete against lightweight laptops that currently rely on NPU/GPU for basic system AI. However, the development ecosystem —compilers, optimized libraries, frameworks like PyTorch with mature SME2 support— will play a decisive role. For now, Arm’s emphasis is on Android and the mobile form factor, where its market share is dominant.

Key points to watch (for technologists)

Toolchains with SME2: if Kleidi AI and the support in major frameworks land in 2025 with maturity, C1 will gain traction not only in lightweight inference but also in pre/post-processing currently bottlenecked by NPU.
Drivers and RT on mobile: the promise of 2× ray tracing is compelling; its adoption in live games depends on Vulkan drivers, stability, and title-specific optimization.
Thermal scalability: while moving to 3 nm helps, maintaining 60 fps with RTUv2 in very compact chassis remains a challenge due to power gating and dissipation.
Partner ecosystem: how quickly Qualcomm/MediaTek/Samsung adapt CSS will determine the mass availability —and pricing— in 2026.

Summary

With Lumex CSS, C1, and Mali G1-Ultra, Arm is not just updating its catalog: redefining its value proposition for a decade where on-device AI will be predominant. The C1 CPUs with SME2 push the CPU into a more prominent AI acceleration role, and the G1-Ultra GPU confirms that ray tracing on mobile is no longer an experimental feature. If the integration timelines promised by CSS are met and the software ecosystem supports it, 2026 could be the year mobile devices de-couple much of their intelligence from the cloud — with better latencies, more privacy, and in fact, better graphics.

FAQs

What is Arm Lumex CSS, and how does it differ from launching individual IPs?
It’s a comprehensive subsystem (CPU C1 + GPU G1 + interconnect/MMU + tools) ready for integration into a SoC, with reference physical implementations on 3 nm. It reduces complexity and integration time compared to assembling IPs one by one.

What does SME2 add to the C1 compared to just using NPU/GPU?
SME2 enables matrix multiplication on the CPU (Armv9.3), improving latency and efficiency for small inferences and network “glue” tasks (pre/post processing), delivering up to 5× performance and about 3× efficiency over previous models.

How much does Mali G1-Ultra improve over Immortalis-G925?
Arm claims 2× in ray tracing (RTUv2) and approximately +20% in graphics and AI benchmarks; it also reports improvements in popular titles like Fortnite, Genshin Impact, Arena Breakout, Honkai, as well as developer tools such as Vulkan counters and RenderDoc support coming soon.

When will the first phones with C1/G1 appear?
Public info suggests commercial adoption starting late 2025 and into 2026, depending on tape-out schedules and 3 nm fab capacity.

How does this impact x86 competition?
The performance-per-watt and AI latency of C1 + SME2 challenge x86 in mobile and edge devices. The real battleground will be software (toolchains, frameworks, drivers) and OEM deployment without friction.

What should mobile game developers watch for?
Optimize Vulkan, adopt ASR, plan for scalable RT pathways, employ per-tile counters, and consider thermal constraints. With G1-Ultra, there’s a realistic chance for selective RT effects and better frame pacing if thermal budgets permit.

Sources: Arm’s technical disclosures and blogs on Mali G1-Ultra and C1/SME2, along with tech and business press coverage (Reuters, EE Times), and partner/media summaries.

X (Twitter) Facebook Pinterest LinkedIn E-mail