A team of Japanese researchers has announced the launch of “Fugaku-LLM,” a large-scale language model with enhanced capabilities in the Japanese language, developed using the Fugaku supercomputer. This advancement promises to revolutionize research and business applications in Japan and beyond.
The team, led by Professor Rio Yokota from the Tokyo Institute of Technology, Associate Professor Keisuke Sakaguchi from Tohoku University, Koichi Shirahata from Fujitsu Limited, Team Leader Mohamed Wahib from RIKEN, Associate Professor Koji Nishiguchi from Nagoya University, Shota Sasaki from CyberAgent, Inc., and Noriyuki Kojima from Kotoba Technologies Inc., has successfully trained a language model with 13 billion parameters, surpassing the 7 billion parameter models that are prevalent in Japan.
Innovation in Language Model Training
To train this model on Fugaku, the researchers developed distributed training methods, including the portability of the Megatron-DeepSpeed deep learning framework to Fugaku, thereby optimizing the performance of transformers. Additionally, they accelerated the dense matrix multiplication library for transformers and optimized communication by combining three types of parallelization techniques.
Performance and Applications
Fugaku-LLM, trained with proprietary data collected by CyberAgent and additional English and math data, has demonstrated superior capabilities in humanities and social science tasks, achieving a score of 9.18 on the Japanese MT-Bench, the highest among open models trained with original data produced in Japan.
The source code for Fugaku-LLM is available on GitHub, and the model can be found on Hugging Face, allowing for its use in both research and commercial applications, provided the license is adhered to.
Collaboration and Contributions
Each institution has played a crucial role in this project:
- Tokyo Institute of Technology: Overall supervision and communication optimization.
- Tohoku University: Data collection and model selection.
- Fujitsu: Acceleration of computation and communication.
- RIKEN: Distributed parallelization and communication acceleration.
- Nagoya University: Study of applications for generative 3D AI.
- CyberAgent: Provision of training data.
- Kotoba Technologies: Portability of the deep learning framework.
Future Impact
With Fugaku-LLM, Japan strengthens its position in developing artificial intelligence, demonstrating that large-scale language models can be efficiently trained using CPUs instead of GPUs, a crucial solution in light of the global GPU shortage.
This model is not only a powerful tool for academic research, but it also has the potential to drive innovative commercial applications, such as scientific simulation and the creation of virtual communities with thousands of AIs.
Conclusion
The launch of Fugaku-LLM marks a significant milestone in the realm of artificial intelligence in Japan, showcasing the power of the Fugaku supercomputer and the advanced capabilities of Japanese researchers. This model not only enhances understanding of the Japanese language but also lays the groundwork for future innovations in various scientific and commercial fields.