带有学习演讲说明辅助变量的非线性ISA (Nonlinear ISA with Auxiliary Variables for Learning Speech Representations)

This paper extends recent work on nonlinear Independent Component Analysis (ICA) by introducing a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables. Observed high dimensional acoustic features like log Mel spectrograms can be considered as surface level manifestations of nonlinear transformations over individual multivariate sources of information like speaker characteristics, phonological content etc. Under assumptions of energy based models we use the theory of nonlinear ISA to propose an algorithm that learns unsupervised speech representations whose subspaces are independent and potentially highly correlated with the original non-stationary multivariate sources. We show how nonlinear ICA with auxiliary variables can be extended to a generic identifiable model for subspaces as well while also providing sufficient conditions for the identifiability of these high dimensional subspaces. Our proposed methodology is generic and can be integrated with standard unsupervised approaches to learn speech representations with subspaces that can theoretically capture independent higher order speech signals. We evaluate the gains of our algorithm when integrated with the Autoregressive Predictive Decoding (APC) model by showing empirical results on the speaker verification and phoneme recognition tasks.

翻译：本文扩展了非线性独立部件分析(ICA)的近期工作,在辅助变量面前引入了非线性独立子空间分析(ISA)的理论框架。观测到的高维声学特征,如log Mel光谱等,可被视为相对于个别多变信息来源的非线性变异的表面表现,如发言者特点、声学内容等。根据以能源为基础的模型假设,我们使用非线性ISA理论,提出一种算法,以学习非线性发言表达,其子空间是独立的,并且可能与原非静止多变源高度相关。我们通过展示发言者核查和电话识别任务的经验结果,来评估与自动递增预测解调模型相结合的算法收益。