The predict-then-optimize paradigm bridges online learning and contextual optimization in dynamic environments. Previous works have investigated the sequential updating of predictors using feedback from downstream decisions to minimize regret in the full-information settings. However, existing approaches are predominantly frequentist, rely heavily on gradient-based strategies, and employ deterministic predictors that could yield high variance in practice despite their asymptotic guarantees. This work introduces, to the best of our knowledge, the first Bayesian online contextual optimization framework. Grounded in PAC-Bayes theory and general Bayesian updating principles, our framework achieves $\mathcal{O}(\sqrt{T})$ regret for bounded and mixable losses via a Gibbs posterior, eliminates the dependence on gradients through sequential Monte Carlo samplers, and thereby accommodates nondifferentiable problems. Theoretical developments and numerical experiments substantiate our claims.
翻译:预测后优化范式在动态环境中连接了在线学习与上下文优化。先前研究探讨了利用下游决策反馈对预测器进行序贯更新,以在全信息设定下最小化遗憾。然而,现有方法主要为频率主义框架,严重依赖基于梯度的策略,且采用确定性预测器——尽管具有渐近保证,但在实践中可能产生高方差。本研究首次提出了贝叶斯在线上下文优化框架(据我们所知)。该框架基于PAC-Bayes理论与广义贝叶斯更新原理,通过吉布斯后验对有界可混合损失实现了$\mathcal{O}(\sqrt{T})$遗憾界,利用序贯蒙特卡洛采样器摆脱了对梯度的依赖,从而适用于不可微问题。理论推导与数值实验验证了所提方法的有效性。