Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric rectification quality from the layout analysis errors of OCR engines, and masked AD/AAD (AD-M/AAD-M) tailored for accurately evaluating geometric distortions in documents with incomplete boundaries. Extensive experiments show that our method establishes new state-of-the-art performance on multiple challenging benchmarks, yielding a substantial reduction of 14.1\%--34.7\% in the AAD metric and demonstrating superior efficacy in real-world applications. The code will be publicly available at https://github.com/chaoyunwang/ArbDR.
翻译:真实场景中的文档矫正因相机视角与物理形变的极端变化而面临显著挑战。基于复杂变换可被分解并逐步解决的洞见,我们提出了一种新颖的多阶段框架,以从粗到精的方式逐步逆转不同类型的畸变。具体而言,该框架首先执行全局仿射变换以校正由相机视角引起的透视畸变,随后矫正因纸张卷曲与折叠产生的几何变形,最后采用内容感知的迭代过程消除细粒度内容畸变。为应对现有评估方案的局限性,我们提出了两项增强指标:布局对齐的OCR指标(AED/ACER),通过解耦几何矫正质量与OCR引擎的布局分析误差以实现稳定评估;以及掩码AD/AAD(AD-M/AAD-M),专门用于精确评估边界不完整文档的几何畸变。大量实验表明,本方法在多个挑战性基准测试中取得了新的最优性能,AAD指标显著降低14.1%–34.7%,并在实际应用中展现出卓越效能。代码将公开于https://github.com/chaoyunwang/ArbDR。