Red Hat continues to work on providing options for its customers within the realm of enterprise AI with the introduction of the Red Hat AI Inference Server, third-party models validated by Red Hat AI, and the integration of the Llama Stack APIs and Model Context Protocol. Through these advancements, Red Hat intends to enhance the capabilities that companies need to accelerate AI adoption.
According to Forrester, open-source software will be the catalyst for accelerating enterprise AI efforts. As the landscape of AI becomes increasingly complex and dynamic, the Red Hat AI Inference Server and validated third-party models offer efficient model inference and a proven collection of AI models optimized for performance on the Red Hat AI platform. Together with the integration of new APIs for developing generative AI agents, including Llama Stack and MCP, Red Hat is working to address the complexity of deployment, empowering IT leaders, data scientists, and developers to accelerate AI initiatives with greater control and efficiency.
Efficient Inference in Hybrid Cloud with Red Hat AI Inference Server
The Red Hat AI portfolio now includes the new Red Hat AI Inference Server, which provides faster, more consistent, and cost-effective inference at scale in hybrid cloud environments. This key addition is integrated into the latest versions of Red Hat OpenShift AI and Red Hat Enterprise Linux AI, and is also available as a standalone offering, allowing organizations to deploy intelligent applications with greater efficiency, flexibility, and performance.
Proven and Optimized Models with Red Hat AI Validated Third-Party Models
The third-party models validated by Red Hat AI, available on Hugging Face, make it easier for companies to find the right models for their specific needs. Red Hat AI offers a collection of validated models, as well as deployment guides to increase customer confidence in model performance and result reproducibility. Some models are also optimized by Red Hat, leveraging model compression techniques to reduce size and increase inference speed, helping to minimize resource consumption and operational costs. Additionally, the ongoing model validation process helps Red Hat AI customers stay at the forefront of optimized generative AI innovation.
Standardized APIs for Application Development and AI Agents with Llama Stack and MCP
Red Hat AI is integrating Llama Stack, initially developed by Meta, along with MCP from Anthropic, to provide users with standardized APIs for the creation and deployment of applications and AI agents. Llama Stack, currently available in a developer preview on Red Hat AI, offers a unified API for accessing inference with vLLM, retrieval-augmented generation (RAG), model evaluation, security barriers, and agents across any generative AI model. MCP enables models to integrate with external tools by providing a standardized interface to connect APIs, plugins, and data sources in agent workflows.
The latest version of Red Hat OpenShift AI (v2.20) offers additional enhancements for the creation, training, deployment, and monitoring at scale of both generative and predictive AI models. These include:
Optimized Model Catalog (Tech Preview): Provides easy access to validated models from Red Hat and third parties, allows them to be deployed on Red Hat OpenShift AI clusters via the web console interface, and manages the lifecycle of those models leveraging the integrated registry of Red Hat OpenShift AI.
Distributed Training via the KubeFlow Training Operator: Enables scheduling and execution of InstructLab model fine-tuning and other training and fine-tuning workloads based on PyTorch, distributed across multiple nodes and GPUs on Red Hat OpenShift, including acceleration of distributed RDMA networks and optimized GPU utilization to reduce costs.
- Feature Store (Tech Preview): Based on the Kubeflow Feast community project, it provides a centralized repository for managing and serving data for both model training and inference, streamlining data workflows to improve model accuracy and reuse.
Red Hat Enterprise Linux AI 1.5 introduces new updates to Red Hat’s foundational model platform for developing, testing, and running large language models (LLMs). Key features of version 1.5 include:
Availability on Google Cloud Marketplace: Expands options for customers to run Red Hat Enterprise Linux AI in public cloud environments (alongside AWS and Azure), simplifying the deployment and management of AI workloads on Google Cloud.
- Enhanced Multilingual Capabilities for Spanish, German, French, and Italian through InstructLab: Allows model customization using native scripts and unlocks new possibilities for multilingual AI applications. Users can also bring their own "teacher" models for greater control over customization and testing of models for specific use cases and languages, with future support anticipated for Japanese, Hindi, and Korean.
The Red Hat AI InstructLab service on IBM Cloud is now also generally available. This new cloud service further streamlines the process of model customization, enhancing scalability and user experience, allowing businesses to leverage their unique data with greater ease and control.
Red Hat’s Vision: Any Model, Any Accelerator, Any Cloud
The future of AI should be defined by limitless opportunities, not by constraints imposed by infrastructure silos. Red Hat envisions a future where organizations can deploy any model, on any accelerator, across any cloud, delivering a consistently exceptional user experience without exorbitant costs. To unlock the true potential of investments in generative AI, companies need a universal inference platform: a standard for smoother, high-performance AI innovation, both now and in the future.