We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits
翻译:我们考虑一个具有独立同分布奖励的随机多臂赌博机问题,其中期望奖励函数为多模态函数,至多包含m个模态。我们提出了首个已知的计算可行算法,用于求解Graves-Lai优化问题,进而实现了针对该赌博机问题的渐近最优算法。所提算法的代码公开于https://github.com/wilrev/MultimodalBandits。