X (Twitter) Facebook Pinterest LinkedIn E-mail

Oracle has announced the general availability of its new bare metal instances on Oracle Cloud Infrastructure (OCI) based on AMD Instinct™ MI355X, the next generation following MI300X, featuring more HBM3e memory, wider bandwidth, and new FP4/FP6/FP8 precisions thanks to the CDNA 4 architecture. With this launch, Oracle claims to be the first hyperscaler to offer MI355X publicly and the only one in its catalog to include both MI355X and MI300X.

What does MI355X bring compared to the previous generation?

Memory and bandwidth per GPU: 288 GB HBM3e (+50%) and 8 TB/sec bandwidth (+51%).
Precision and performance: supports FP4/FP6/FP8 in CDNA 4, with approximately 2.5× improvement in FP8/FP16 over the previous generation (CDNA 3).
System resources (per server): EPYC 5th gen CPU (128 cores), 3 TB DDR5, 2.3 TB HBM3e added (8×288 GB), and 61.44 TB of local NVMe storage (+100%).
Networking and scalability: 400 Gbps front-end network (×4), liquid-cooled racks scalable up to 64 GPUs per chassis; 3,200 Gbps cluster network for distributed training.

Vultr AMD GPU MI355X — Oracle Launches Bare Metal Instances with AMD Instinct MI355X GPUs in OCI: More Memory, Higher Bandwidth, and Starting Price of $8.60/hour 4

New OCI Bare Metal Instance (BM.GPU.MI355X.8)

Name: BM.GPU.MI355X.8
Accelerators: 8× AMD Instinct™ MI355X (288 GB per GPU)
Integrated GPU memory: 2.3 TB HBM3e
System CPU / RAM: 128-core AMD EPYC 5th gen + 3 TB DDR5
Local storage: 61.44 TB
Networking: 400 Gbps (front-end) + 3,200 Gbps (cluster)
Price: starting at $8.60/hour (according to Oracle)

Intended use: training large language models (LLMs) and large-scale multimodal models, real-time inference of agents and Mixture of Experts (MoE), long-context tasks (RAG, summarization), and HPC (CAE, CFD, digital twins, genomics, climate, finance, GNN).

Zettascale in the cloud: up to 131,072 GPUs and ultra-low latency RDMA

Oracle highlights that its OCI Supercluster Zettascale—the foundation of the company’s large-scale AI training ecosystem—can scale up to 131,072 GPUs with high-performance RDMA networking and ultra-low latency. Oracle describes it as the largest “AI supercomputer” in the cloud. In this ecosystem, MI355X provides “approximately 3× computational power” and “+50% memory” over previous generations, accelerating training time and the efficiency of distributed jobs.

Open ecosystem (ROCm) and compatibility

OCI MI355X instances rely on ROCm™, AMD’s open computing platform. In addition to standard frameworks (PyTorch, TensorFlow, ONNX Runtime, Triton), AMD and Oracle emphasize CUDA-to-ROCm porting pathways to facilitate migration without extensive rewrites.

Customer use cases

Absci (biotech): accelerates drug discovery with generative AI (large-scale molecular dynamics, antibody design). Reports 2.5 µs inter-GPU latency, TB/s throughput, and no hypervisor overhead on OCI.
Seekr (explainable AI): multi-year agreement to train next-generation models and agents at a global scale on OCI + AMD, prioritizing dense multinode compute and international presence.

Why it matters

Memory and bandwidth: the 288 GB HBM3e per GPU and 8 TB/sec open doors to longer contexts, Larger batches, and less offloading to the system, crucial for MoE and RAG.
Cost/performance: an entry price starting at $8.60/hr for an 8-GPU MI355X bare metal instance signals aggressive market positioning amid GPU availability and cost pressures.
Scale: the combination of networking (400 Gbps FE / 3.2 Tbps cluster), liquid-cooled racks, and Zettascale points to large, stable clusters that bridge the gap from proofs of concept to industrial AI.

Getting started

The BM.GPU.MI355X.8 instances are now available for request on OCI and will be ready in the second half of next year, integrated into Oracle’s AI infrastructure family alongside MI300X. Oracle has released additional resources as part of Oracle AI World 2025, including product details, keynotes, and technical documentation.

Quick questions

How does MI355X compare to MI300X?
Increased HBM3e (288 GB per GPU), +51% bandwidth (8 TB/sec), new FP4/6/8 precisions with CDNA 4, and overall system improvements (CPU, RAM, NVMe, networking).

What workloads is it ideal for?
Training and inference of multimodal large language models (LLMs), MoE, agents, long-context tasks, as well as HPC (CAE/CFD, genomics, climate modeling, finance, GNN).

What does OCI offer at the cluster level?
Low-latency RDMA networking, liquid-cooled racks, the Zettascale supercluster with up to 131,072 GPUs, and sovereignty controls for cloud deployment.

Is it compatible with my existing stack?
Supports ROCm and standard frameworks. There are CUDA-to-ROCm migration pathways without extensive rewrites, according to AMD and Oracle.

via: blogs.oracle

X (Twitter) Facebook Pinterest LinkedIn E-mail