Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning - 专知论文

会员服务 ·

0

可约的 · MoDELS · GLUE · 全 · 得分 ·

2023 年 6 月 1 日

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

翻译：暂无翻译

Baohao Liao,Shaomu Tan,Christof Monz

from arxiv, Code at https://github.com/BaohaoLiao/mefts

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate activations for the gradient calculation, akin to fine-tuning. One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed. Nevertheless, modifying a PLM to its reversible variant with PEFT is not straightforward, since the reversible model has a distinct architecture from the currently released PLMs. In this paper, we first investigate what is a key factor for the success of existing PEFT methods, and realize that it's essential to preserve the PLM's starting point when initializing a PEFT method. With this finding, we propose memory-efficient fine-tuning (MEFT) that inserts adapters into a PLM, preserving the PLM's starting point and making it reversible without additional pre-training. We evaluate MEFT on the GLUE benchmark and five question-answering tasks with various backbones, BERT, RoBERTa, BART and OPT. MEFT significantly reduces the activation memory up to 84% of full fine-tuning with a negligible amount of trainable parameters. Moreover, MEFT achieves the same score on GLUE and a comparable score on the question-answering tasks as full fine-tuning.

翻译：暂无翻译

0

相关内容

可约的

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

PD-1/PD-L1通路介导手术创伤后T淋巴细胞功能障碍的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

辅助T细胞分化调节在针刺治疗神经病理痛中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

针刺对宫内窘迫HIBD大鼠BNIP3介导的线粒体自噬保护机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PbTe-SrTe-M2Te（M=Na, K）赝三元系相关系及微结构研究

国家自然科学基金

0+阅读 · 2011年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

Making Pre-trained Language Models both Task-solvers and Self-calibrators

Arxiv

0+阅读 · 2023年7月21日

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

Arxiv

0+阅读 · 2023年7月19日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Making Pre-trained Language Models both Task-solvers and Self-calibrators

Arxiv

0+阅读 · 2023年7月21日

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

Arxiv

0+阅读 · 2023年7月19日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

PD-1/PD-L1通路介导手术创伤后T淋巴细胞功能障碍的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

辅助T细胞分化调节在针刺治疗神经病理痛中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

针刺对宫内窘迫HIBD大鼠BNIP3介导的线粒体自噬保护机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PbTe-SrTe-M2Te（M=Na, K）赝三元系相关系及微结构研究

国家自然科学基金

0+阅读 · 2011年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员