The advent of generative models has dramatically improved the accuracy of image inpainting. In particular, by removing specific text from document images, reconstructing original images is extremely important for industrial applications. However, most existing methods of text removal focus on deleting simple scene text which appears in images captured by a camera in an outdoor environment. There is little research dedicated to complex and practical images with dense text. Therefore, we created benchmark data for text removal from images including a large amount of text. From the data, we found that text-removal performance becomes vulnerable against mask profile perturbation. Thus, for practical text-removal tasks, precise tuning of the mask shape is essential. This study developed a method to model highly flexible mask profiles and learn their parameters using Bayesian optimization. The resulting profiles were found to be character-wise masks. It was also found that the minimum cover of a text region is not optimal. Our research is expected to pave the way for a user-friendly guideline for manual masking.
翻译:生成模型的出现显著提升了图像修复的精度。特别是在文档图像中去除特定文本,重建原始图像对于工业应用至关重要。然而,现有文本去除方法大多专注于删除户外相机拍摄图像中出现的简单场景文本,针对包含密集文本的复杂实用图像的研究较少。为此,我们创建了涵盖大量文本的图像文本去除基准数据集。通过该数据,我们发现文本去除性能对掩码轮廓扰动较为敏感。因此,在实际文本去除任务中,精确调整掩码形状至关重要。本研究开发了一种对高度灵活的掩码轮廓进行建模的方法,并利用贝叶斯优化学习其参数。结果表明,最优轮廓呈现字符级掩码特征,且文本区域的最小覆盖并非最优解。本研究成果有望为用户友好的手动掩码标注指南提供理论依据。