We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer. This setting encompasses a range of architectures, from deep random-feature models to echo-state networks (ESNs) with recurrent dynamics. Working in the high-dimensional regime, we apply Random Matrix Theory to derive a closed-form expression for the asymptotic generalization error. We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance. Surprisingly, we show that a linear ESN is equivalent to ridge regression with an exponentially time-weighted (''memory'') input covariance, revealing a clear inductive bias toward recent inputs. Experiments match predictions: ESNs win in low-sample, short-memory regimes, while ridge prevails with more data or long-range dependencies. Our methodology provides a general framework for analyzing overparameterized models and offers insights into the behavior of deep learning networks.
翻译:我们首先研究了采用固定特征表示(冻结中间层)后接可训练读出层的模型的泛化误差。该设定涵盖了一系列架构,从深度随机特征模型到具有循环动态的储备池网络(ESNs)。在高维体系下,我们应用随机矩阵理论推导出渐近泛化误差的闭式表达式。随后,我们将此分析应用于循环表示,并得到刻画其性能的简洁公式。令人惊讶的是,我们证明线性ESN等价于具有指数时间加权(“记忆”)输入协方差的岭回归,这揭示了对近期输入的明确归纳偏好。实验验证了预测结果:ESNs在低样本、短记忆机制中表现更优,而岭回归在数据量更大或存在长程依赖时更具优势。我们的方法论为分析过参数化模型提供了一个通用框架,并为深度学习网络的行为提供了新的见解。