Aerial object detection faces significant challenges in real-world scenarios, such as small objects and extensive background interference, which limit the performance of RGB-based detectors with insufficient discriminative information. Multispectral images (MSIs) capture additional spectral cues across multiple bands, offering a promising alternative. However, the lack of training data has been the primary bottleneck to exploiting the potential of MSIs. To address this gap, we introduce the first large-scale dataset for Multispectral Object Detection in Aerial images (MODA), which comprises 14,041 MSIs and 330,191 annotations across diverse, challenging scenarios, providing a comprehensive data foundation for this field. Furthermore, to overcome challenges inherent to aerial object detection using MSIs, we propose OSSDet, a framework that integrates spectral and spatial information with object-aware cues. OSSDet employs a cascaded spectral-spatial modulation structure to optimize target perception, aggregates spectrally related features by exploiting spectral similarities to reinforce intra-object correlations, and suppresses irrelevant background via object-aware masking. Moreover, cross-spectral attention further refines object-related representations under explicit object-aware guidance. Extensive experiments demonstrate that OSSDet outperforms existing methods with comparable parameters and efficiency.
翻译:航空目标检测在现实场景中面临显著挑战,如小目标及大范围背景干扰,这限制了基于RGB检测器的性能,因其判别信息不足。多光谱图像(MSIs)通过多波段捕获额外的光谱线索,提供了一种有前景的替代方案。然而,训练数据的缺乏一直是挖掘MSIs潜力的主要瓶颈。为填补这一空白,我们首次引入了用于航空图像多光谱目标检测的大规模数据集(MODA),该数据集包含14,041幅MSIs和330,191个标注,覆盖多样且具有挑战性的场景,为该领域提供了全面的数据基础。此外,为克服利用MSIs进行航空目标检测的固有挑战,我们提出了OSSDet框架,该框架整合了光谱与空间信息以及目标感知线索。OSSDet采用级联光谱-空间调制结构以优化目标感知,通过利用光谱相似性聚合光谱相关特征以增强目标内部关联,并借助目标感知掩码抑制无关背景。此外,跨光谱注意力机制在显式目标感知引导下进一步精化了目标相关表征。大量实验表明,OSSDet在参数和效率相当的情况下优于现有方法。