基于扩散贝叶斯探索的动态错误状态估计校正 (Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration)

In emergency response and other high-stakes societal applications, early-stage state estimates critically shape downstream outcomes. Yet, these initial state estimates-often based on limited or biased information-can be severely misaligned with reality, constraining subsequent actions and potentially causing catastrophic delays, resource misallocation, and human harm. Under the stationary bootstrap baseline (zero transition and no rejuvenation), bootstrap particle filters exhibit Stationarity-Induced Posterior Support Invariance (S-PSI), wherein regions excluded by the initial prior remain permanently unexplorable, making corrections impossible even when new evidence contradicts current beliefs. While classical perturbations can in principle break this lock-in, they operate in an always-on fashion and may be inefficient. To overcome this, we propose a diffusion-driven Bayesian exploration framework that enables principled, real-time correction of early state estimation errors. Our method expands posterior support via entropy-regularized sampling and covariance-scaled diffusion. A Metropolis-Hastings check validates proposals and keeps inference adaptive to unexpected evidence. Empirical evaluations on realistic hazardous-gas localization tasks show that our approach matches reinforcement learning and planning baselines when priors are correct. It substantially outperforms classical SMC perturbations and RL-based methods under misalignment, and we provide theoretical guarantees that DEPF resolves S-PSI while maintaining statistical rigor.

翻译：在应急响应及其他高风险社会应用中，早期状态估计对下游结果具有决定性影响。然而，这些初始状态估计通常基于有限或有偏差的信息，可能与现实严重偏离，从而限制后续行动，并可能导致灾难性延误、资源错配及人员伤害。在平稳自助法基线（零转移且无更新）条件下，自助粒子滤波器表现出平稳性诱导的后验支撑不变性（S-PSI），即被初始先验排除的区域将永久无法探索，即使新证据与当前信念相矛盾也无法修正。虽然经典扰动原则上可打破这种锁定，但其始终处于激活状态且效率可能低下。为此，我们提出一种扩散驱动的贝叶斯探索框架，能够对早期状态估计误差进行原则性的实时校正。该方法通过熵正则化采样和协方差缩放扩散扩展后验支撑。Metropolis-Hastings检验验证提议样本，使推断能自适应意外证据。在真实有害气体定位任务上的实证评估表明：当先验正确时，本方法与强化学习和规划基线性能相当；在先验失准情况下，其显著优于经典序列蒙特卡洛扰动及基于强化学习的方法。我们同时提供理论保证，证明DEPF在保持统计严谨性的同时解决了S-PSI问题。