BitSnap：大语言模型训练中的检查点稀疏化与量化 (BitSnap: Checkpoint Sparsification and Quantization in LLM Training) - 专知论文

会员服务 ·

0

检查点 · 稀疏 · 稀疏化 · 精度 · 语言模型 ·

BitSnap: Checkpoint Sparsification and Quantization in LLM Training

翻译：BitSnap：大语言模型训练中的检查点稀疏化与量化

Qingping Li,Yanxin Peng,Baodong Wu,Shigang Li,Guohao Dai,Shengen Yan,Yu Wang

from arxiv, 12 pages, numerous figures

As large language models (LLMs) continue to grow in size and complexity, efficient checkpoint saving\&loading has become crucial for managing storage, memory usage, and fault tolerance in LLM training. The current works do not comprehensively take into account the optimization of these several aspects. This paper proposes a novel checkpoint sparsification and quantization method that adapts dynamically to different training stages and model architectures. We present a comprehensive analysis of existing lossy and lossless compression techniques, identify current limitations, and introduce our adaptive approach that balances compression ratio, speed, and precision impact throughout the training process. Experiments on different sizes of LLMs demonstrate that our bitmask-based sparsification method achieves 16x compression ratio without compromising model accuracy. Additionally, the cluster-based quantization method achieves 2x compression ratio with little precision loss.

翻译：随着大语言模型（LLMs）规模和复杂度的持续增长，高效的检查点保存与加载已成为管理LLM训练中存储、内存使用和容错能力的关键。当前的研究未能全面兼顾这些多方面的优化。本文提出了一种新颖的检查点稀疏化与量化方法，能够动态适应不同的训练阶段和模型架构。我们对现有的有损与无损压缩技术进行了全面分析，指出了当前方法的局限性，并引入了我们的自适应方法，该方法在整个训练过程中平衡了压缩比、速度和精度影响。在不同规模的LLMs上进行的实验表明，我们基于位掩码的稀疏化方法实现了16倍的压缩比，且未损害模型精度。此外，基于聚类的量化方法实现了2倍的压缩比，同时精度损失极小。

0

相关内容

检查点

【CVPR2024】MoReVQA:探索视频问答的模块化推理模型

【CVPR2024】MoReVQA:探索视频问答的模块化推理模型

专知会员服务

18+阅读 · 2024年4月10日

【AAAI2024】使用大型语言模型的生成式多模态知识检索

【AAAI2024】使用大型语言模型的生成式多模态知识检索

专知会员服务

58+阅读 · 2024年1月19日

【ICML2021】REPAINT:深度强化学习中的知识迁移

专知会员服务

23+阅读 · 2021年9月5日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知

10+阅读 · 2021年4月14日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

NLP自然语言处理（二）——基础文本分析

NLP自然语言处理（二）——基础文本分析

乐享数据DataScientists

12+阅读 · 2017年2月7日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

随机Helmholtz型问题的数值方法

国家自然科学基金

0+阅读 · 2014年12月31日

Multilingual VLM Training: Adapting an English-Trained VLM to French

Arxiv

0+阅读 · 12月11日

LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling

Arxiv

0+阅读 · 11月26日

Limitations of Scalarisation in MORL: A Comparative Study in Discrete Environments

Arxiv

0+阅读 · 11月20日

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding

Arxiv

0+阅读 · 11月19日

ViMoNet: A Multimodal Vision-Language Framework for Human Behavior Understanding from Motion and Video

Arxiv

0+阅读 · 11月16日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2024】MoReVQA:探索视频问答的模块化推理模型

【CVPR2024】MoReVQA:探索视频问答的模块化推理模型

专知会员服务

18+阅读 · 2024年4月10日

【AAAI2024】使用大型语言模型的生成式多模态知识检索

【AAAI2024】使用大型语言模型的生成式多模态知识检索

专知会员服务

58+阅读 · 2024年1月19日

【ICML2021】REPAINT:深度强化学习中的知识迁移

专知会员服务

23+阅读 · 2021年9月5日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知

10+阅读 · 2021年4月14日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

NLP自然语言处理（二）——基础文本分析

NLP自然语言处理（二）——基础文本分析

乐享数据DataScientists

12+阅读 · 2017年2月7日

相关论文

Multilingual VLM Training: Adapting an English-Trained VLM to French

Arxiv

0+阅读 · 12月11日

LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling

Arxiv

0+阅读 · 11月26日

Limitations of Scalarisation in MORL: A Comparative Study in Discrete Environments

Arxiv

0+阅读 · 11月20日

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding

Arxiv

0+阅读 · 11月19日

ViMoNet: A Multimodal Vision-Language Framework for Human Behavior Understanding from Motion and Video

Arxiv

0+阅读 · 11月16日

相关基金

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

随机Helmholtz型问题的数值方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员