Intel and Aible, a comprehensive enterprise solution for generative artificial intelligence (GenAI) and augmented analytics without servers, have announced new solutions for shared customers that enable advanced use cases of GenAI and retrieval-augmented generation (RAG) on multiple generations of Intel® Xeon® CPUs. This collaboration, which includes engineering optimizations and a benchmarking program, enhances Aible’s ability to deliver GenAI results at low cost for enterprise customers and helps developers integrate artificial intelligence into their applications.
Innovations in GenAI Performance with Intel Xeon
Aible’s solutions demonstrate how CPUs can significantly improve performance across a variety of modern AI workloads, from running language models to RAG. Optimized for Intel processors, Aible’s technology uses an efficient serverless approach to AI, consuming resources only when there are active user requests. For example, the vector database is activated only for a few seconds to retrieve relevant information for a user query, and the language model is briefly powered on to process and respond to the request. This on-demand operation helps reduce the total cost of ownership (TCO).
Although RAG is typically implemented using GPUs (graphics processing units) and accelerators to leverage their parallel processing capabilities, Aible’s serverless technique, combined with Intel® Xeon® scalable processors, allows RAG use cases to be fully powered by CPUs. Performance data shows that multiple generations of Intel Xeon processors can efficiently run RAG workloads.
Strategic Collaboration for Efficiency in AI
Mishali Naik, Senior Principal Engineer at Intel in the Data Center and AI Group, emphasized: “Customers are looking for efficient, enterprise-level solutions to harness the power of AI. Our collaboration with Aible demonstrates how we are working closely with the industry to deliver AI innovation and reduce the barrier to entry for many customers to run the latest GenAI workloads using Intel Xeon processors.”
Cost Reduction and Efficiency Improvement
Aible enables customers to reduce operational costs of GenAI projects by exclusively using serverless CPUs to securely share the same underlying computing resources among multiple clients. According to Aible’s benchmark analysis, customers can achieve up to a 55x cost savings by running RAG models on their serverless CPU-based solutions. This cost reduction is a testament to the effectiveness of Aible’s exclusive approach, which avoids the need for more expensive GPU-based infrastructures.
Results of the Intel-Aible Collaboration
Intel, including Intel Labs, has worked with Aible to optimize AI workloads on Xeon processors. Notably, by optimizing Aible’s code for AVX-512, Aible saw significant performance gains and improved its capacity on Xeon processors, highlighting the impact of strategic software optimizations on overall efficiency.
The combination of RAG models with Intel Xeon processors, facilitated by platforms like Aible, can enable applications such as:
– Natural Language Processing (NLP)
– Recommendation Systems
– Decision Support Systems
– Content Generation