重访含缺失值的多元时间序列预测问题 (Revisiting Multivariate Time Series Forecasting with Missing Values)

Missing values are common in real-world time series, and multivariate time series forecasting with missing values (MTSF-M) has become a crucial area of research for ensuring reliable predictions. To address the challenge of missing data, current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data. However, this framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy. In this paper, we conduct a systematic empirical study and reveal that imputation without direct supervision can corrupt the underlying data distribution and actively degrade prediction accuracy. To address this, we propose a paradigm shift that moves away from imputation and directly predicts from the partially observed time series. We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle. CRIB combines a unified-variate attention mechanism with a consistency regularization scheme to learn robust representations that filter out noise introduced by missing values while preserving essential predictive signals. Comprehensive experiments on four real-world datasets demonstrate the effectiveness of CRIB, which predicts accurately even under high missing rates. Our code is available in https://github.com/Muyiiiii/CRIB.

翻译：缺失值在现实世界的时间序列中普遍存在，而含缺失值的多元时间序列预测（MTSF-M）已成为确保可靠预测的关键研究领域。为应对数据缺失的挑战，当前方法已发展出“填补-预测”框架：先通过填补模块填充缺失值，再基于填补后的数据进行预测。然而，该框架忽视了一个关键问题：缺失值不存在真实标签，导致填补过程易产生误差，进而降低预测准确性。本文通过系统性实证研究发现，无直接监督的填补可能破坏底层数据分布并显著损害预测精度。为此，我们提出一种范式转换——摒弃填补步骤，直接基于部分观测到的时间序列进行预测。我们基于信息瓶颈原理，提出了新颖的“一致性正则化信息瓶颈”（CRIB）框架。CRIB结合了统一变量注意力机制与一致性正则化方案，通过学习鲁棒表征来过滤缺失值引入的噪声，同时保留关键的预测信号。在四个真实数据集上的综合实验表明，CRIB即使在高缺失率下仍能实现精准预测。代码已开源：https://github.com/Muyiiiii/CRIB。