We study the problem of matching correlated VAR time series databases, where a multivariate time series is observed along with a perturbed and permuted version, and the goal is to recover the unknown matching between them. To model this, we introduce a probabilistic framework in which two time series $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$ are jointly generated, such that $x^\#_t=x_{π^*(t)}+σ\tilde{x}_{π^*(t)}$, where $(x_t)_{t\in[T]},(\tilde{x}_t)_{t\in[T]}$ are independent and identically distributed vector autoregressive (VAR) time series of order $1$ with Gaussian increments, for a hidden $π^*$. The objective is to recover $π^*$, from the observation of $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$. This generalizes the classical problem of matching independent point clouds to the time series setting. We derive the maximum likelihood estimator (MLE), leading to a quadratic optimization over permutations, and theoretically analyze an estimator based on linear assignment. For the latter approach, we establish recovery guarantees, identifying thresholds for $σ$ that allow for perfect or partial recovery. Additionally, we propose solving the MLE by considering convex relaxations of the set of permutation matrices (e.g., over the Birkhoff polytope). This allows for efficient estimation of $π^*$ and the VAR parameters via alternating minimization. Empirically, we find that linear assignment often matches or outperforms MLE relaxation based approaches.
翻译:我们研究相关VAR时间序列数据库的匹配问题,其中观测到一个多元时间序列及其经过扰动和置换的版本,目标是恢复两者之间的未知匹配关系。为此,我们引入一个概率框架:两个时间序列$(x_t)_{t\\in[T]},(x^\\#_t)_{t\\in[T]}$联合生成,满足$x^\\#_t=x_{\\pi^*(t)}+\\sigma\\tilde{x}_{\\pi^*(t)}$,其中$(x_t)_{t\\in[T]},(\\tilde{x}_t)_{t\\in[T]}$为一阶向量自回归(VAR)时间序列,具有独立同分布的高斯增量,且存在隐藏置换$\\pi^*$。目标是从观测数据$(x_t)_{t\\in[T]},(x^\\#_t)_{t\\in[T]}$中恢复$\\pi^*$。该问题将经典独立点云匹配推广至时间序列场景。我们推导了最大似然估计量(MLE),其可转化为置换空间上的二次优化问题,并从理论上分析了基于线性分配的估计方法。针对线性分配方法,我们建立了恢复保证,确定了允许完全或部分恢复的$\\sigma$阈值。此外,我们提出通过考虑置换矩阵集合的凸松弛(例如在Birkhoff多胞体上)来求解MLE,从而通过交替最小化高效估计$\\pi^*$和VAR参数。实证结果表明,线性分配方法在性能上常与基于MLE松弛的方法相当或更优。