Red Hat, a global leader in open source software solutions, announced Red Hat AI 3, a major update to its enterprise artificial intelligence platform. The new version combines the innovations of Red Hat AI Inference Server, Red Hat Enterprise Linux AI (RHEL AI), and Red Hat OpenShift AI with the goal of reducing the complexity of high-performance, large-scale inference. This enables organizations to move more quickly from proof of concept to production environments and better work around AI-powered applications.
As companies leave behind the purely experimental phase, they face significant challenges: data privacy, cost control, and multi-model management. The report The GenAI Divide: State of AI in Business from MIT’s NANDA project paints this picture: around 95% of organizations fail to achieve measurable financial returns, despite enterprise investments totaling approximately $40 billion.
Red Hat AI 3 directly addresses these challenges by offering a unified and coherent experience that allows CIOs and IT teams to better leverage compute acceleration investments. The platform enables scalable and agile orchestration of AI workloads across hybrid and multi-cloud environments, while fostering collaboration among teams working on cutting-edge use cases—such as agents—within a single operational framework. Built on open standards, Red Hat AI 3 supports organizations at any stage of adoption, with compatibility for any model on any hardware accelerator, from data centers to public cloud, and edge AI environments, including even the farthest network edges.
From training to execution: transitioning to enterprise AI inference
As organizations bring their AI initiatives into production, the focus shifts from model training and tuning to inference, the «execution» phase of enterprise AI. Red Hat AI 3 emphasizes scalable, cost-effective inference, building on the success of community projects vLLM and llm-d, along with Red Hat’s model optimization capabilities, to deliver a high-quality, production-ready large language model (LLM) service.
To help CIOs maximize their high-value hardware investments, Red Hat OpenShift AI 3.0 introduces general availability of llm-d, which reimagines how LLMs run natively on Kubernetes. llm-d enables intelligent distributed inference, leveraging proven Kubernetes orchestration and vLLM performance, combined with key open source technologies such as Kubernetes Gateway API Inference Extension, NVIDIA Dynamo (NIXL) low-latency data transfer library, and DeepEP Mixture of Experts (MoE) communication library, allowing organizations to:
- Reduce costs and improve response times through intelligent, optimized model scheduling for inference, along with a decoupled service architecture.
- Ensure operational simplicity and maximum reliability via prescriptive «well-lit paths» that streamline large-scale model deployment on Kubernetes.
- Enhance flexibility by supporting multi-platform deployment of LLM inference across a range of hardware accelerators, including NVIDIA and AMD.
Built on vLLM, llm-d transforms from a high-performance, single-node inference engine into a consistent, scalable distributed service system. It integrates closely with Kubernetes to deliver predictable performance, measurable ROI, and efficient infrastructure planning. These improvements directly address the challenges of managing highly variable LLM workloads and serving massive models like MoE models.
A unified platform for collaborative AI
Red Hat AI 3 provides a unified, flexible experience specially tailored to the collaborative demands inherent in building production-ready generative AI solutions. Designed to generate tangible value by fostering collaboration and harmonizing workflows across teams, it offers a single platform that enables both platform engineers and AI teams to execute their strategies effectively. Key new features aimed at accelerating productivity and efficiency from proof of concept to production include:
- Model-as-a-Service (MaaS) capabilities based on distributed inference, allowing IT teams to act as their own MaaS providers by centrally serving common models and offering on-demand access for AI developers and applications. This improves cost management and supports use cases that cannot be deployed on public AI services due to privacy or data concerns.
- The AI Hub empowers platform engineers to explore, deploy, and manage foundational AI assets. It provides a central hub with a curated catalog of validated, optimized AI models, a registry for managing model lifecycle, and a deployment environment to configure and monitor all AI assets running on OpenShift AI.
- Gen AI Studio offers a practical environment for AI engineers to interact with models and rapidly prototype new generative AI applications. Featuring an AI assets endpoint, engineers can easily discover and consume available models and MCP servers designed to streamline model integration with external tools. The integrated playground provides an interactive, session-independent environment for experimenting with models, testing prompts, and tuning parameters for use cases such as chat and retrieval-augmented generation (RAG).
- New validated and optimized models from Red Hat simplify development. The curated selection includes popular open source models like OpenAI’s gpt-oss, DeepSeek-R1, and specialized models such as Whisper for speech-to-text conversion and Voxtral Mini for voice-enabled agents.
Laying the groundwork for next-generation AI agents
AI agents are poised to transform application development, with complex, autonomous workflows that will place high demands on inference capabilities. Red Hat OpenShift AI 3.0 continues laying the foundation for scalable, agentic AI systems—not only through inference features but also with new functionalities and improvements centered on agent management.
To accelerate agent creation and deployment, Red Hat has introduced a unified API layer based on Llama Stack, aligning development with industry standards such as LLM interface protocols compatible with OpenAI. Furthermore, to promote a more open and interoperable ecosystem, Red Hat is among the first to adopt the Model Context Protocol (MCP), an emerging, powerful standard that streamlines how AI models interact with external tools—a core feature for modern AI agents. Red Hat AI 3 also introduces a modular, extensible toolkit for customizing models, built on the existing InstructLab functionality. It offers specialized Python libraries that give developers greater flexibility and control. The toolkit leverages open source projects like Docling for document ingestion, enabling easier conversion of unstructured data into AI-friendly formats. It also includes flexible frameworks for synthetic data generation and a training hub for fine-tuning LLMs. An integrated evaluation hub helps AI engineers monitor and validate results, empowering them to confidently utilize proprietary data for more accurate, relevant AI outcomes.

