As large language models (LLMs) continue to grow in size and complexity, efficient checkpoint saving\&loading has become crucial for managing storage, memory usage, and fault tolerance in LLM training. The current works do not comprehensively take into account the optimization of these several aspects. This paper proposes a novel checkpoint sparsification and quantization method that adapts dynamically to different training stages and model architectures. We present a comprehensive analysis of existing lossy and lossless compression techniques, identify current limitations, and introduce our adaptive approach that balances compression ratio, speed, and precision impact throughout the training process. Experiments on different sizes of LLMs demonstrate that our bitmask-based sparsification method achieves 16x compression ratio without compromising model accuracy. Additionally, the cluster-based quantization method achieves 2x compression ratio with little precision loss.
翻译:随着大语言模型(LLMs)规模和复杂度的持续增长,高效的检查点保存与加载已成为管理LLM训练中存储、内存使用和容错能力的关键。当前的研究未能全面兼顾这些多方面的优化。本文提出了一种新颖的检查点稀疏化与量化方法,能够动态适应不同的训练阶段和模型架构。我们对现有的有损与无损压缩技术进行了全面分析,指出了当前方法的局限性,并引入了我们的自适应方法,该方法在整个训练过程中平衡了压缩比、速度和精度影响。在不同规模的LLMs上进行的实验表明,我们基于位掩码的稀疏化方法实现了16倍的压缩比,且未损害模型精度。此外,基于聚类的量化方法实现了2倍的压缩比,同时精度损失极小。