面向在线决策的无等待离线调优与重解算法 (Wait-Less Offline Tuning and Re-solving for Online Decision Making)

from arxiv, In this version, we achieve a tighter regret bound with the warm start for the first batch. We also make the proof more elegant by manually accepting all subsequent orders once the constraint is violated. In this way, we do not need to introduce the concept of stopping time for the analysis of the LP-based method

Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear programming (LP) subproblems that incorporate updated resource information. However, LP-based methods are computationally expensive and often inefficient for large-scale applications. In contrast, recent first-order OLP algorithms are more computationally efficient but typically suffer from worse regret guarantees. To address these shortcomings, we propose a new algorithm that combines the strengths of LP-based and first-order OLP methods. The algorithm re-solves the LP subproblems periodically at a predefined frequency $f$ and uses the latest dual prices to guide online decision-making. In addition, a first-order method runs in parallel during each interval between LP re-solves, smoothing resource consumption. Our algorithm achieves $\mathscr{O}(\log (T/f) + \sqrt{f})$ regret, delivering a "wait-less" online decision-making process that balances the computational efficiency of first-order methods and the superior regret guarantee of LP-based methods.

翻译：在线线性规划（OLP）在收益管理与资源分配领域具有广泛应用。当前最先进的OLP算法通过反复求解包含更新后资源信息的线性规划（LP）子问题来实现低遗憾度。然而，基于LP的方法计算成本高昂，在大规模应用中往往效率低下。相比之下，近期的一阶OLP算法计算效率更高，但通常遗憾度保证较差。为克服这些缺陷，我们提出一种新算法，融合了基于LP方法与一阶OLP方法的优势。该算法以预设频率$f$周期性地重解LP子问题，并利用最新的对偶价格指导在线决策。此外，在每次LP重解间隔期间，并行运行的一阶方法可平滑资源消耗。我们的算法实现了$\mathscr{O}(\log (T/f) + \sqrt{f})$的遗憾度，提供了一种“无等待”的在线决策流程，平衡了一阶方法的计算效率与基于LP方法的优越遗憾度保证。