Self-supervised stereo matching holds great promise by eliminating the reliance on expensive ground-truth data. Its dominant paradigm, based on photometric consistency, is however fundamentally hindered by the occlusion challenge -- an issue that persists regardless of network architecture. The essential insight is that for any occluders, valid feedback signals can only be derived from the unoccluded areas on one side of the occluder. Existing methods attempt to address this by focusing on the erroneous feedback from the other side, either by identifying and removing it, or by introducing additional regularities for correction on that basis. Nevertheless, these approaches have failed to provide a complete solution. This work proposes a more fundamental solution. The core idea is to transform the fixed state of one-sided valid and one-sided erroneous signals into a probabilistic acquisition of valid feedback from both sides of an occluder. This is achieved through a complete framework, centered on a pseudo-stereo inputs strategy that decouples the input and feedback, without introducing any additional constraints. Qualitative results visually demonstrate that the occlusion problem is resolved, manifested by fully symmetrical and identical performance on both flanks of occluding objects. Quantitative experiments thoroughly validate the significant performance improvements resulting from solving the occlusion challenge.
翻译:自监督立体匹配通过消除对昂贵真实标注数据的依赖展现出巨大潜力。然而,其基于光度一致性的主流范式从根本上受到遮挡挑战的阻碍——这一问题无论网络架构如何改进都持续存在。核心洞见在于,对于任何遮挡物,有效的反馈信号仅能来源于遮挡物一侧的非遮挡区域。现有方法试图通过关注来自另一侧的误差反馈来解决此问题,要么识别并移除这些误差,要么在此基础上引入额外的正则化进行修正。然而,这些方法未能提供完整的解决方案。本研究提出了一种更为根本的解决方案。其核心思想是将单侧有效信号与单侧误差信号的固定状态,转变为从遮挡物两侧概率性地获取有效反馈。这通过一个完整框架实现,该框架以伪立体输入策略为中心,解耦了输入与反馈过程,且未引入任何额外约束。定性结果直观展示了遮挡问题得到解决,表现为在遮挡物体两侧实现完全对称且一致的性能。定量实验全面验证了通过解决遮挡挑战所带来的显著性能提升。