Satellite-based slum segmentation holds significant promise in generating global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major challenge, hindering the ability of models trained on specific regions to generalize effectively to unseen locations. To address this, we introduce a large-scale high-resolution dataset and propose GRAM (Generalized Region-Aware Mixture-of-Experts), a two-phase test-time adaptation framework that enables robust slum segmentation without requiring labeled data from target regions. We compile a million-scale satellite imagery dataset from 12 cities across four continents for source training. Using this dataset, the model employs a Mixture-of-Experts architecture to capture region-specific slum characteristics while learning universal features through a shared backbone. During adaptation, prediction consistency across experts filters out unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions. GRAM outperforms state-of-the-art baselines in low-resource settings such as African cities, offering a scalable and label-efficient solution for global slum mapping and data-driven urban planning.
翻译:基于卫星影像的贫民窟分割技术对于生成全球城市贫困估算具有重要前景。然而,非正规住区的形态异质性构成了主要挑战,阻碍了在特定区域训练的模型有效泛化至未见地区。为此,我们引入了一个大规模高分辨率数据集,并提出了GRAM(广义区域感知专家混合模型),这是一个两阶段测试时自适应框架,能够在无需目标区域标注数据的情况下实现鲁棒的贫民窟分割。我们汇集了来自四大洲12个城市的百万级卫星影像数据集用于源域训练。利用该数据集,模型采用专家混合架构来捕获区域特定的贫民窟特征,同时通过共享骨干网络学习通用特征。在自适应阶段,专家间预测一致性过滤不可靠的伪标签,使模型能有效泛化至先前未见区域。GRAM在非洲城市等低资源场景中超越了现有最先进的基线方法,为全球贫民窟测绘和数据驱动的城市规划提供了可扩展且标签高效的解决方案。