Snowflake is partnering with Meta to host and optimize a new flagship family of models in Snowflake Cortex AI.

Snowflake, the AI-driven data cloud company, has announced the integration of the open-source multilingual language model collection Llama 3.1 into its Snowflake Cortex AI platform. This addition will enable businesses to create and leverage powerful AI applications at scale more easily. The release includes Meta’s largest and most advanced open-source language model, Llama 3.1 405B. Snowflake has developed and open-sourced a set of inference tools to enable real-time performance and democratize the use of advanced natural language processing and generation applications. The optimization of Snowflake for Llama 3.1 405B allows for a 128K context window from day one, with up to three times lower latency and 1.4 times higher performance than current open-source solutions. Additionally, model tuning can be done massively with a single GPU node, simplifying and reducing costs for developers and users within Cortex AI.

Through its partnership with Meta, Snowflake provides customers with an efficient and secure way to access, fine-tune, and deploy the latest Meta models on its AI data cloud platform, with a focus on trust and security from the start.

“We are enabling businesses and the open-source community to use cutting-edge models like Llama 3.1 405B for inference and fine-tuning, maximizing efficiency,” says Vivek Raghunathan, VP of AI Engineering at Snowflake. “Not only are we providing direct access to Meta models through Snowflake Cortex AI, but we are also offering new research and open source that supports 128K context windows, multi-node inference, pipeline parallelism, 8-bit floating point quantization, and more to advance the AI ecosystem.”

Snowflake’s AI research team continues to expand open-source innovations through contributions to the AI community and maintaining transparency about its LLM technologies. Alongside the release of Llama 3.1 405B, they are introducing their set of mass inference solutions and fine-tuning system optimization in collaboration with DeepSpeed, Hugging Face, vLLM, and the broader AI community. This represents a significant advancement in inference and fine-tuning of models with billions of parameters.

Large models and memory requirements pose significant challenges for achieving low-latency inference in real-time applications, high performance for cost-effectiveness, and support for extended contexts in enterprise generative AI. Snowflake’s massive LLM inference system optimization and fine-tuning system stack addresses these issues using advanced parallelism and memory optimization techniques, enabling efficient processing without the need for expensive infrastructure. For Llama 3.1 405B, the Snowflake platform offers high real-time performance on a single GPU node and supports a 128,000 context window in multi-node configurations. This flexibility applies to both modern and older hardware, making it accessible to a broader range of businesses. Additionally, data scientists can fine-tune Llama 3.1 405B using mixed precision techniques on fewer GPUs, eliminating the need for large GPU clusters. As a result, organizations can easily, efficiently, and securely tailor and deploy enterprise-level generative AI applications.

Snowflake has also developed an optimized infrastructure for fine-tuning that includes techniques such as model distillation, security barriers, Retrieval-Augmented Generation (RAG), and synthetic data generation, facilitating the onset of these use cases within Cortex AI.

AI security is crucial for Snowflake and its customers. Therefore, Snowflake has launched Snowflake Cortex Guard to protect against harmful content in any LLM application or asset created in Cortex AI, whether using Meta models or LLMs from other major providers like AI21 Labs, Google, Mistral AI, Reka, and Snowflake itself. Cortex Guard uses Meta’s Llama Guard 2 to ensure that the models used are safe.

Customer and partner feedback on this news from Snowflake includes:

– Dave Lindley, Sr. Director of Data Products at E15 Group: “We rely on generative AI to analyze and better understand our Voice of the Customer platform. Accessing Meta’s Llama models within Snowflake Cortex AI helps us gain insights needed to improve our business.”
– Ryan Klapper, AI lead at Hakkoda: “Security and trust in generative AI are essential. Snowflake provides us with the necessary assurances to use advanced language models safely, allowing us to enhance our internal applications.”
– Matthew Scullion, CEO and Co-founder of Matillion: “Integrating Meta’s Llama models into Snowflake Cortex AI offers our customers access to the most advanced language models and flexibility to adapt to their AI needs.”
– Kevin Niparko, VP of Product Strategy and Technology at Twilio Segment: “The ability to choose the right model in Snowflake Cortex AI allows our customers to generate smart AI-based insights and apply them to their tools, helping achieve optimal results.”

In conclusion, Snowflake’s integration of the Llama 3.1 language models from Meta into its Cortex AI platform represents a significant step forward in enabling businesses to leverage cutting-edge AI technologies efficiently, securely, and cost-effectively. The advancements in inference, fine-tuning, and security measures provided by Snowflake will empower organizations to develop and deploy high-quality generative AI applications with confidence and ease.

Scroll to Top