通过扩散贝叶斯探索动态修正错误状态估计 (Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration)

In emergency response and other high-stakes societal applications, early-stage state estimates critically shape downstream outcomes. Yet, these initial state estimates-often based on limited or biased information-can be severely misaligned with reality, constraining subsequent actions and potentially causing catastrophic delays, resource misallocation, and human harm. Under the stationary bootstrap baseline (zero transition and no rejuvenation), bootstrap particle filters exhibit Stationarity-Induced Posterior Support Invariance (S-PSI), wherein regions excluded by the initial prior remain permanently unexplorable, making corrections impossible even when new evidence contradicts current beliefs. While classical perturbations can in principle break this lock-in, they operate in an always-on fashion and may be inefficient. To overcome this, we propose a diffusion-driven Bayesian exploration framework that enables principled, real-time correction of early state estimation errors. Our method expands posterior support via entropy-regularized sampling and covariance-scaled diffusion. A Metropolis-Hastings check validates proposals and keeps inference adaptive to unexpected evidence. Empirical evaluations on realistic hazardous-gas localization tasks show that our approach matches reinforcement learning and planning baselines when priors are correct. It substantially outperforms classical SMC perturbations and RL-based methods under misalignment, and we provide theoretical guarantees that DEPF resolves S-PSI while maintaining statistical rigor.

翻译：在应急响应及其他高风险社会应用中，早期状态估计对下游结果具有决定性影响。然而，这些初始状态估计——通常基于有限或有偏差的信息——可能与现实严重偏离，从而限制后续行动，并可能导致灾难性延误、资源错配及人员伤害。在平稳自助法基线（零转移且无恢复）下，自助粒子滤波器表现出平稳性诱导的后验支撑不变性（S-PSI），即被初始先验排除的区域将永久无法探索，即使新证据与当前信念相矛盾，修正也变得不可能。虽然经典扰动原则上可以打破这种锁定，但其以持续激活的方式运行且可能效率低下。为克服此问题，我们提出一种扩散驱动的贝叶斯探索框架，能够对早期状态估计误差进行原则性的实时修正。我们的方法通过熵正则化采样和协方差缩放扩散来扩展后验支撑。Metropolis-Hastings检验验证提议并保持推理对意外证据的自适应性。在真实有害气体定位任务上的实证评估表明，当先验正确时，我们的方法匹配强化学习和规划基线。在估计失准情况下，其显著优于经典序贯蒙特卡洛扰动及基于强化学习的方法，并且我们提供了理论保证，证明DEPF在保持统计严谨性的同时解决了S-PSI问题。