Environmental modeling faces critical challenges in predicting ecosystem dynamics across unmonitored regions due to limited and geographically imbalanced observation data. This challenge is compounded by spatial heterogeneity, causing models to learn spurious patterns that fit only local data. Unlike conventional domain generalization, environmental modeling must preserve invariant physical relationships and temporal coherence during augmentation. In this paper, we introduce Generalizable Representation Enhancement via Auxiliary Transformations (GREAT), a framework that effectively augments available datasets to improve predictions in completely unseen regions. GREAT guides the augmentation process to ensure that the original governing processes can be recovered from the augmented data, and the inclusion of the augmented data leads to improved model generalization. Specifically, GREAT learns transformation functions at multiple layers of neural networks to augment both raw environmental features and temporal influence. They are refined through a novel bi-level training process that constrains augmented data to preserve key patterns of the original source data. We demonstrate GREAT's effectiveness on stream temperature prediction across six ecologically diverse watersheds in the eastern U.S., each containing multiple stream segments. Experimental results show that GREAT significantly outperforms existing methods in zero-shot scenarios. This work provides a practical solution for environmental applications where comprehensive monitoring is infeasible.
翻译:环境建模在预测未监测区域的生态系统动态方面面临严峻挑战,这主要源于观测数据有限且地理分布不均。空间异质性进一步加剧了这一挑战,导致模型学习到仅适用于局部数据的虚假模式。与传统领域泛化不同,环境建模必须在数据增强过程中保持不变的物理关系与时间一致性。本文提出了一种通过辅助变换实现可泛化表征增强的框架GREAT,该框架能有效增强现有数据集,以提升对完全未见区域的预测能力。GREAT通过引导增强过程确保原始控制过程可从增强数据中恢复,且增强数据的引入能提升模型的泛化性能。具体而言,GREAT在神经网络的多层级学习变换函数,以同时增强原始环境特征与时间影响。这些函数通过新颖的双层训练过程进行优化,该过程约束增强数据以保留原始源数据的关键模式。我们在美国东部六个生态多样性流域(每个流域包含多个河段)的河流温度预测任务中验证了GREAT的有效性。实验结果表明,GREAT在零样本场景下显著优于现有方法。本研究为无法实施全面监测的环境应用提供了实用解决方案。