Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data

Visual understanding and segmentation of materials and their states is fundamental to understanding the physical world. The myriad textures, shapes, and often blurry boundaries formed by materials make this task particularly hard to generalize. Whether it's identifying wet regions of a surface, minerals in rocks, infected regions in plants, or pollution in water, each material state has its own unique form. For neural nets to learn general class-agnostic material segmentation, it is necessary to first collect and annotate data that captures this complexity. Collecting and manually annotating real-world images is limited by the cost and precision of manual labor. In contrast, synthetic CGI data is highly accurate and almost cost-free, but fails to replicate the vast diversity of the material world. This work offers a method to bridge this crucial gap by implanting patterns extracted from real-world images in synthetic data. Hence, patterns automatically collected from natural images are used to map materials into synthetic scenes. This unsupervised approach allows the generated data to capture the vast complexity of the real world while maintaining the precision and scale of synthetic data. We also present the first general benchmark for zero-shot material state segmentation. The benchmark contains a wide range of real-world images of material states, like food, rocks, construction, plants, liquids, and many others, each in various states (wet/dry/stained/cooked/burned/worn/rusted/sediment/foam, etc.). The annotation includes both partial similarity between regions with similar but not identical materials, and hard segmentation of only points in the exact same material state. We show that net trains on MatSeg significantly outperform existing state-of-the-art methods on this task. The dataset, code, and trained model are available

翻译：暂无翻译