This study investigates the limitations of applying Markov Chain Monte Carlo (MCMC) methods to arbitrary objective functions, focusing on a two-block MCMC framework which alternates between Metropolis-Hastings and Gibbs sampling. While such approaches are often considered advantageous for enabling data-driven regularization, we show that their performance critically depends on the sharpness of the employed likelihood form. By introducing a sharpness parameter and exploring alternative likelihood formulations proportional to the target objective function, we demonstrate how likelihood curvature governs both in-sample performance and the degree of regularization inferred by the training data. Empirical applications are conducted on reinforcement learning tasks: including a navigation problem and the game of tic-tac-toe. The study concludes with a separate analysis examining the implications of extreme likelihood sharpness on arbitrary objective functions stemming from the classic game of blackjack, where the first block of the two-block MCMC framework is replaced with an iterative optimization step. The resulting hybrid approach achieves performance nearly identical to the original MCMC framework, indicating that excessive likelihood sharpness effectively collapses posterior mass onto a single dominant mode.
翻译:本研究探讨了将马尔可夫链蒙特卡洛(MCMC)方法应用于任意目标函数的局限性,重点关注一种交替使用Metropolis-Hastings采样和Gibbs采样的双块MCMC框架。尽管此类方法常被认为有利于实现数据驱动的正则化,但我们证明其性能关键取决于所采用似然形式的锐度。通过引入锐度参数并探索与目标函数成比例的替代似然公式,我们展示了似然曲率如何同时控制样本内性能以及训练数据推断出的正则化程度。实证应用在强化学习任务上进行:包括导航问题和井字棋游戏。研究最后通过独立分析,检验了源自经典二十一点游戏的任意目标函数在极端似然锐度下的影响,其中双块MCMC框架的第一块被替换为迭代优化步骤。所得的混合方法实现了与原始MCMC框架几乎相同的性能,表明过高的似然锐度会有效地将后验质量坍缩到单一主导模态上。