州学习低维度国家空间及超分分度常年神经网络 (Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Network)

Overparameterization in deep learning typically refers to settings where a trained Neural Network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of Recurrent Neural Networks (RNNs), there exists an additional layer of overparameterization, in the sense that a model may exhibit many solutions that generalize well for sequence lengths seen in training, some of which extrapolate to longer sequences, while others do not. Numerous works studied the tendency of Gradient Descent (GD) to fit overparameterized NNs with solutions that generalize well. On the other hand, its tendency to fit overparameterized RNNs with solutions that extrapolate has been discovered only lately, and is far less understood. In this paper, we analyze the extrapolation properties of GD when applied to overparameterized linear RNNs. In contrast to recent arguments suggesting an implicit bias towards short-term memory, we provide theoretical evidence for learning low dimensional state spaces, which can also model long-term memory. Our result relies on a dynamical characterization which shows that GD (with small step size and near-zero initialization) strives to maintain a certain form of balancedness, as well as on tools developed in the context of the moment problem from statistics (recovery of a probability distribution from its moments). Experiments corroborate our theory, demonstrating extrapolation via learning low dimensional state spaces with both linear and non-linear RNNs

翻译：深层学习中的超度分解通常是指经过培训的神经网络(NN)具有在许多方面适应培训数据的代表性能力的环境,其中有些是全面概括的,而另一些则不是。在经常性神经网络(RNN)中,还存在额外的超度分解层,即模型可能展示出许多解决方案,对培训中所见的序列长度非常概括化,有些是外推到较长的顺序,而另一些则不是。许多工作研究了渐进源(GD)对超度分解的NNN(GD)的定位趋势,以多种方式适应超度分解的NND(GN)数据,而有些方法则非常笼统,而另一些工作则没有。另一方面,它倾向于将超度分计的RNNNN(RNNN)数据与最近才发现且远为外推的解决方案相匹配。在本文中,我们分析了GD的外推值在应用到超度线线线线线线线性状态(GD)对短期记忆的隐含偏向性偏差,我们提供了理论证据,这些也模拟长期记忆的模型。我们的直径直径直径直径性结果取决于动态的理论,从极级的理论分析,从极地的理论分析过程的深度分布,从最初的深度分析,从正态的深度分析过程的深度分析过程的深度分析过程的深度分析过程的深度分析,从正态,从正态的深度分析,从直径向,从正态,从点的走向,从点向,从点向,向,向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向,从一个方向的直向,从一个方向,从一个方向的直向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向后向