Stereo matching has recently witnessed remarkable progress using Deep Neural Networks (DNNs). But, how robust are they? Although it has been well-known that DNNs often suffer from adversarial vulnerability with a catastrophic drop in performance, the situation is even worse in stereo matching. This paper first shows that a type of weak white-box attacks can fail state-of-the-art methods. The attack is learned by a proposed stereo-constrained projected gradient descent (PGD) method in stereo matching. This observation raises serious concerns for the deployment of DNN-based stereo matching. Parallel to the adversarial vulnerability, DNN-based stereo matching is typically trained under the so-called simulation to reality pipeline, and thus domain generalizability is an important problem. This paper proposes to rethink the learnable DNN-based feature backbone towards adversarially-robust and domain generalizable stereo matching, either by completely removing it or by applying it only to the left reference image. It computes the matching cost volume using the classic multi-scale census transform (i.e., local binary pattern) of the raw input stereo images, followed by a stacked Hourglass head sub-network solving the matching problem. In experiments, the proposed method is tested in the SceneFlow dataset and the KITTI2015 benchmark. It significantly improves the adversarial robustness, while retaining accuracy performance comparable to state-of-the-art methods. It also shows better generalizability from simulation (SceneFlow) to real (KITTI) datasets when no fine-tuning is used.
翻译:最近,通过深神经网络(DNN), Stereo 匹配最近取得了显著的进展。但是,它们有多强?虽然众所周知DNN经常遭受对抗性脆弱性,其性能出现灾难性下降,但在立体匹配方面情况更加糟糕。本文首先显示,一种弱的白箱袭击可能无法采用最先进的方法。这次袭击是通过在立体匹配中拟议采用一个受立体约束的预测梯度下降法(PGD)来学习的。这一观察引起了对部署基于DNN的立体匹配的严重关切。与对抗性脆弱性平行,基于DNNN的立体匹配通常是在所谓的模拟到现实管道中受训的,因此域通用性是一个重要的问题。本文提议重新思考一种基于DNNN的可学习性功能主干线,以对抗性强的立体匹配,要么完全删除它,要么将它应用到左侧参考图像。它用经典的多级普查变换制(i.e. 本地二进制模式)来计算相匹配成本的立体输入立体立体立体图像,然后进行纸质测试的平基质测试,而后,Scal- Chal-ITIT(Slog-hal-IT-I) 测试时它也是测试的精确性测试的测试的精确的方法。