什么因素影响大语言模型的有效深度？ (What Affects the Effective Depth of Large Language Models?)

The scaling of large language models (LLMs) emphasizes increasing depth, yet performance gains diminish with added layers. Prior work introduces the concept of "effective depth", arguing that deeper models fail to fully utilize their layers for meaningful computation. Building on this, we systematically study how effective depth varies with model scale, training type, and task difficulty. First, we analyze the model behavior of Qwen-2.5 family (1.5B-32B) and find that while the number of effective layers grows with model size, the effective depth ratio remains stable. Besides, comparisons between base and corresponding long-CoT models show no increase in effective depth, suggesting that improved reasoning stems from longer context rather than deeper per-token computation. Furthermore, evaluations across tasks of varying difficulty indicate that models do not dynamically use more layers for harder problems. Our results suggest that current LLMs underuse available depth across scales, training paradigms and tasks of varying difficulties, pointing out research opportunities on increasing the layer utilization rate of LLMs, model pruning, and early exiting. Our code is released at https://github.com/AheadOFpotato/what_affects_effective_depth.

翻译：大语言模型（LLMs）的扩展强调增加深度，但性能提升随层数增加而减弱。先前研究引入了“有效深度”的概念，认为更深层的模型未能充分利用其层进行有意义的计算。在此基础上，我们系统研究了有效深度如何随模型规模、训练类型和任务难度变化。首先，我们分析了Qwen-2.5系列（1.5B-32B）的模型行为，发现虽然有效层数随模型规模增长，但有效深度比率保持稳定。此外，基础模型与对应长链思维（long-CoT）模型的比较显示有效深度并未增加，这表明推理能力的提升源于更长的上下文而非更深层的单标记计算。进一步地，对不同难度任务的评估表明，模型并未针对更困难问题动态使用更多层。我们的结果表明，当前LLMs在不同规模、训练范式和任务难度下均未充分利用可用深度，这指出了在提高LLMs层利用率、模型剪枝和早期退出方面的研究机遇。代码发布于https://github.com/AheadOFpotato/what_affects_effective_depth。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日