Distributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training data. This paper introduces a principled approach to simultaneously address both challenges. We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions, where an $\epsilon$-fraction of the training data is adversarially corrupted. Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts, alongside an efficient algorithm inspired by robust statistics to solve the resulting optimization problem. We prove that our method achieves an estimation error of $O(\sqrt{\epsilon})$ for the true DRO objective value using only the contaminated data under the bounded covariance assumption. This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
翻译:分布鲁棒优化(DRO)为分布不确定性下的决策提供了框架,但其有效性可能受到训练数据中异常值的损害。本文提出一种原则性方法,以同时应对这两类挑战。我们聚焦于优化具有凸Lipschitz损失函数的广义线性模型的Wasserstein-1 DRO目标,其中训练数据的$\epsilon$比例遭受对抗性污染。我们的核心贡献在于一个新颖的建模框架,该框架将针对训练数据污染的鲁棒性与针对分布偏移的鲁棒性相融合,并设计了一种受鲁棒统计启发的有效算法来求解所得优化问题。我们证明,在协方差有界假设下,仅使用污染数据,我们的方法对真实DRO目标值能达到$O(\sqrt{\epsilon})$的估计误差。本研究首次为数据污染与分布偏移双重挑战下的学习问题建立了严格的理论保证,并辅以高效的计算支持。