Gradient AI, a company specialized in artificial intelligence, has achieved a significant breakthrough in the field of natural language processing by extending the context of Meta’s Llama 3 models to over 1 million tokens. This achievement positions these models as having the most context in the open source domain.
A quantum leap in processing capacity
The Llama 3 language models, recently released by Meta, have generated great excitement in the open source community due to their exceptional performance. However, a notable limitation was their relatively small context length. Gradient AI saw this as an opportunity to enhance these models.
Context length determines how much text a model can consider at once in both input and output. While more advanced models offer context windows of up to 128,000 tokens (approximately 90,000 words), Gradient AI has managed to increase this capacity to over 1 million tokens for the 8B and 70B parameter Llama 3 models.
Infrastructure and technology behind the achievement
To carry out this project, Gradient AI partnered with Crusoe, a computing infrastructure provider. The choice of hardware was crucial, opting for NVIDIA L40S GPUs due to their fast availability and outstanding performance in 8-bit floating point operations (FP8).
Gradient AI’s team implemented advanced optimization techniques, such as RingAttention, to overcome memory limitations and enable effectively infinite context lengths. Additionally, they developed proprietary strategies to balance computation workload and improve overall training performance.
Impact and efficiency
The resulting models have shown outstanding results in information retrieval tests and rank among the best on the Open LLM Leaderboard. Furthermore, the estimated training cost for these extended models is competitive compared to fine-tuning options available through commercial APIs.
Environmental considerations
At a time when the demand for more powerful AI models is increasing exponentially, Gradient AI and Crusoe have also addressed the sustainability aspect. Crusoe powers its data centers with a combination of wasted, stranded, and clean energy, allowing for the execution of large-scale AI workloads while aligning with climate goals.
This breakthrough in extending the context of Llama 3 models represents a significant step towards more capable and versatile language models, with potential applications across a wide range of industries and use cases.
source: Crusoe