Learning effective pricing strategies is crucial in digital marketplaces, especially when buyers' valuations are unknown and must be inferred through interaction. We study the online contextual pricing problem, where a seller observes a stream of context-valuation pairs and dynamically sets prices. Moreover, departing from traditional online learning frameworks, we consider a strategic setting in which buyers may misreport valuations to influence future prices, a challenge known as strategic overfitting (Amin et al., 2013). We introduce a strategy-robust notion of regret for multi-buyer online environments, capturing worst-case strategic behavior in the spirit of the Price of Anarchy. Our first contribution is a polynomial-time approximation scheme (PTAS) for learning linear pricing policies in adversarial, adaptive environments, enabled by a novel online sketching technique. Building on this result, we propose our main construction: the Sparse Update Mechanism (SUM), a simple yet effective sequential mechanism that ensures robustness to all Nash equilibria among buyers. Moreover, our construction yields a black-box reduction from online expert algorithms to strategy-robust learners.
翻译:在数字市场中,学习有效的定价策略至关重要,尤其是在买家估值未知且必须通过交互推断的情况下。我们研究了在线情境定价问题,其中卖家观察一系列情境-估值对并动态设定价格。此外,与传统在线学习框架不同,我们考虑了一个策略性环境,即买家可能误报估值以影响未来价格,这一挑战被称为策略过拟合(Amin等人,2013)。我们为多买家在线环境引入了一种策略鲁棒的遗憾概念,以无政府价格比的精神捕捉最坏情况下的策略行为。我们的第一个贡献是针对对抗性自适应环境中学习线性定价策略的多项式时间近似方案(PTAS),该方案通过一种新颖的在线草图技术实现。基于这一结果,我们提出了主要构建:稀疏更新机制(SUM),这是一种简单而有效的序列机制,确保对买家间所有纳什均衡的鲁棒性。此外,我们的构建实现了从在线专家算法到策略鲁棒学习器的黑盒归约。