随机特征网络中弱到强泛化的可证明性研究 (Weak-to-Strong Generalization Even in Random Feature Networks, Provably) - 专知论文

会员服务 ·

0

泛化 · 单元 · GPT-4 · GPT-2 · 特征模 ·

Weak-to-Strong Generalization Even in Random Feature Networks, Provably

翻译：随机特征网络中弱到强泛化的可证明性研究

Marko Medvedev,Kaifeng Lyu,Dingli Yu,Sanjeev Arora,Zhiyuan Li,Nathan Srebro

from arxiv, Edits: Fixed typesetting errors from v2

Weak-to-Strong Generalization (Burns et al., 2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a strong learner like GPT-4. We consider student and teacher that are random feature models, described by two-layer networks with a random and fixed bottom layer and a trained top layer. A "weak" teacher, with a small number of units (i.e. random features), is trained on the population, and a "strong" student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. Importantly, we also show the quantitative limits of weak-to-strong generalization in this model.

翻译：弱到强泛化（Burns等人，2024）是指强学生模型（如GPT-4）从弱教师模型（如GPT-2）学习任务后，性能显著超越教师的现象。本文证明该现象并不需要GPT-4这样的强学习者。我们采用随机特征模型作为师生架构：该模型由两层网络构成，底层为随机固定层，顶层为可训练层。首先训练一个具有少量单元（即随机特征）的“弱”教师模型于总体数据上，随后训练一个具有更多单元（即随机特征）的“强”学生模型，且仅使用弱教师生成的标签进行训练。我们通过理论证明与实验验证，阐明了学生模型如何仅通过教师标注的数据实现性能超越，并解释了早期停止机制如何促成这种弱到强泛化。重要的是，我们还量化揭示了该模型中弱到强泛化的性能边界。

0

相关内容

【ICML2022】从block-Toeplitz矩阵到图上的微分方程:迈向可扩展掩码Transformers的一般理论

【ICML2022】从block-Toeplitz矩阵到图上的微分方程:迈向可扩展掩码Transformers的一般理论

专知会员服务

18+阅读 · 2022年8月8日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【NeurIPS 2021 】学习理论(有时)可以解释图神经网络中的泛化

【NeurIPS 2021 】学习理论(有时)可以解释图神经网络中的泛化

专知会员服务

30+阅读 · 2021年12月13日

【ICML2021】比较消息传递框架中的图卷积网络

专知会员服务

20+阅读 · 2021年9月12日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

开放知识图谱

21+阅读 · 2020年4月24日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

随机图和随机环境中的接触过程、选举模型、排他过程

国家自然科学基金

0+阅读 · 2015年12月31日

有限范围随机最优控制系统的数值方法与均场倒向随机系统的最优控制问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

网络的小世界结构及其上随机游动的混合时

国家自然科学基金

1+阅读 · 2014年12月31日

On the Stochastic Analysis of Random Linear Streaming Codes in Multi-Hop Relay Networks

Arxiv

0+阅读 · 12月17日

Estimating the Effective Rank of Vision Transformers via Low-Rank Factorization

Arxiv

0+阅读 · 11月30日

Provable Benefits of Sinusoidal Activation for Modular Addition

Arxiv

0+阅读 · 11月28日

Empirical Quantum Advantage in Constrained Optimization from Encoded Unitary Designs

Arxiv

0+阅读 · 11月18日

Calibrated Decomposition of Aleatoric and Epistemic Uncertainty in Deep Features for Inference-Time Adaptation

Arxiv

0+阅读 · 11月15日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2022】从block-Toeplitz矩阵到图上的微分方程:迈向可扩展掩码Transformers的一般理论

【ICML2022】从block-Toeplitz矩阵到图上的微分方程:迈向可扩展掩码Transformers的一般理论

专知会员服务

18+阅读 · 2022年8月8日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【NeurIPS 2021 】学习理论(有时)可以解释图神经网络中的泛化

【NeurIPS 2021 】学习理论(有时)可以解释图神经网络中的泛化

专知会员服务

30+阅读 · 2021年12月13日

【ICML2021】比较消息传递框架中的图卷积网络

专知会员服务

20+阅读 · 2021年9月12日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大模型推理时代的知识编辑

《利用人工智能对军事行动进行建模》

【MIT博士论文】加速科学发现的因果建模实践算法

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

相关资讯

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

开放知识图谱

21+阅读 · 2020年4月24日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

On the Stochastic Analysis of Random Linear Streaming Codes in Multi-Hop Relay Networks

Arxiv

0+阅读 · 12月17日

Estimating the Effective Rank of Vision Transformers via Low-Rank Factorization

Arxiv

0+阅读 · 11月30日

Provable Benefits of Sinusoidal Activation for Modular Addition

Arxiv

0+阅读 · 11月28日

Empirical Quantum Advantage in Constrained Optimization from Encoded Unitary Designs

Arxiv

0+阅读 · 11月18日

Calibrated Decomposition of Aleatoric and Epistemic Uncertainty in Deep Features for Inference-Time Adaptation

Arxiv

0+阅读 · 11月15日

相关基金

随机图和随机环境中的接触过程、选举模型、排他过程

国家自然科学基金

0+阅读 · 2015年12月31日

有限范围随机最优控制系统的数值方法与均场倒向随机系统的最优控制问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

网络的小世界结构及其上随机游动的混合时

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员