We study a two-player dynamic Stackelberg game between a leader and a follower whose intention is unknown to the leader. Classical formulations of the Stackelberg equilibrium (SE) assume that the follower's best response (BR) function is known to the leader. However, this is not always true in practice. We study a setting in which the leader receives updated beliefs about the follower BR before the end of the game, such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees of the SE solutions under this belief update for both open loop and feedback information structures. Interestingly, we prove that in general, assuming an incorrect follower's BR can lead to more optimal leader costs over the entire game than knowing the true follower's BR. We support these results with numerical examples in a linear quadratic (LQ) Stackelberg game, and use Monte Carlo simulations to show that the instances of incorrect BR achieving lower leader costs are non-trivial in collision avoidance LQ Stackelberg games.
翻译:我们研究了一个领导者与跟随者之间的双人动态斯塔克尔伯格博弈,其中跟随者的意图对领导者未知。经典的斯塔克尔伯格均衡(SE)公式假设领导已知跟随者的最优响应(BR)函数,但这在实践中并不总是成立。我们研究了一种场景:领导者在博弈结束前接收到关于跟随者BR的更新信念,该更新促使领导者及随后的跟随者重新优化其策略。我们刻画了在这种信念更新下,针对开环和反馈信息结构的SE解的最优性保证。有趣的是,我们证明,在一般情况下,假设错误的跟随者BR可能导致整个博弈中领导者的成本比知道真实跟随者BR时更优。我们通过线性二次(LQ)斯塔克尔伯格博弈中的数值示例支持这些结果,并利用蒙特卡洛模拟表明,在碰撞避免LQ斯塔克尔伯格博弈中,错误BR实现更低领导者成本的情况并非平凡。