We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an $ε$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.
翻译:本文研究了在领导者缺乏下层博弈知识的情况下,如何高效学习单领导者多追随者斯塔克尔伯格博弈的问题。此类博弈常见于涉及自利智能体的分层决策问题中。例如,在电动网约车市场中,中央管理机构需通过学习最优充电定价来引导网约车公司的车队分布与充电行为模式。现有研究通常采用基于梯度的方法寻找领导者的最优策略,但这类方法要求追随者向领导者共享私有效用信息,在实践中难以实现。为此,我们将下层博弈视为黑箱,仅假设追随者间的交互近似纳什均衡,而领导者能观测到该近似结果所产生的实际成本。在领导者成本函数满足基于核的规律性假设条件下,我们提出了一种无遗憾学习算法,该算法可在$O(\sqrt{T})$轮次内收敛至$ε$-斯塔克尔伯格均衡。最后,我们通过电动网约车市场最优定价的数值案例验证了所提方法的有效性。