Recent advances in artificial neural networks for machine learning, and language modeling in particular, have established a family of recurrent neural network (RNN) architectures that, unlike conventional RNNs with vector-form hidden states, use two-dimensional (2D) matrix-form hidden states. Such 2D-state RNNs, known as Fast Weight Programmers (FWPs), can be interpreted as a neural network whose synaptic weights (called fast weights) dynamically change over time as a function of input observations, and serve as short-term memory storage; corresponding synaptic weight modifications are controlled or programmed by another network (the programmer) whose parameters are trained (e.g., by gradient descent). In this Primer, we review the technical foundations of FWPs, their computational characteristics, and their connections to transformers and state space models. We also discuss connections between FWPs and models of synaptic plasticity in the brain, suggesting a convergence of natural and artificial intelligence.
翻译:近年来,人工神经网络在机器学习特别是语言建模领域取得了显著进展,其中一类循环神经网络(RNN)架构突破了传统RNN采用向量形式隐藏状态的设计,转而采用二维矩阵形式的隐藏状态。这类二维状态RNN被称为快速权重编程器(FWPs),可被解释为一种突触权重(称为快速权重)随时间根据输入观测动态变化的神经网络,其权重变化充当短期记忆存储功能;相应的突触权重调整由另一个网络(编程器)通过训练参数(例如通过梯度下降)进行控制或编程。本文综述了FWPs的技术基础、计算特性及其与Transformer及状态空间模型的关联,并探讨了FWPs与大脑突触可塑性模型之间的联系,揭示了自然智能与人工智能的融合趋势。