Time series forecasting is fundamental to diverse applications, with recent approaches leverage large vision models (LVMs) to capture temporal patterns through visual representations. We reveal that while vision models enhance forecasting performance, 99% of their parameters are unnecessary for time series tasks. Through cross-modal analysis, we find that time series align with low-level textural features but not high-level semantics, which can impair forecasting accuracy. We propose OccamVTS, a knowledge distillation framework that extracts only the essential 1% of predictive information from LVMs into lightweight networks. Using pre-trained LVMs as privileged teachers, OccamVTS employs pyramid-style feature alignment combined with correlation and feature distillation to transfer beneficial patterns while filtering out semantic noise. Counterintuitively, this aggressive parameter reduction improves accuracy by eliminating overfitting to irrelevant visual features while preserving essential temporal patterns. Extensive experiments across multiple benchmark datasets demonstrate that OccamVTS consistently achieves state-of-the-art performance with only 1% of the original parameters, particularly excelling in few-shot and zero-shot scenarios.
翻译:时间序列预测是众多应用的基础,近期方法通过利用大型视觉模型(LVMs)以视觉表征捕捉时序模式。我们发现,尽管视觉模型提升了预测性能,但其99%的参数对于时间序列任务并非必需。通过跨模态分析,我们发现时间序列与低层次纹理特征对齐,而与高层次语义特征不符,后者可能损害预测精度。我们提出OccamVTS,一种知识蒸馏框架,仅从LVMs中提取关键的1%预测信息至轻量级网络。以预训练的LVMs作为特权教师模型,OccamVTS采用金字塔式特征对齐,结合相关性与特征蒸馏,以传递有益模式并滤除语义噪声。反直觉的是,这种激进的参数削减通过消除对无关视觉特征的过拟合,同时保留必要的时序模式,反而提高了准确性。在多个基准数据集上的广泛实验表明,OccamVTS仅使用原始参数的1%即能持续实现最先进的性能,尤其在少样本和零样本场景中表现突出。