解码中的结构化感知：整合方向性、频域-空间与结构注意力用于医学图像分割 (Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation)

To address the limitations of Transformer decoders in capturing edge details, recognizing local textures and modeling spatial continuity, this paper proposes a novel decoder framework specifically designed for medical image segmentation, comprising three core modules. First, the Adaptive Cross-Fusion Attention (ACFA) module integrates channel feature enhancement with spatial attention mechanisms and introduces learnable guidance in three directions (planar, horizontal, and vertical) to enhance responsiveness to key regions and structural orientations. Second, the Triple Feature Fusion Attention (TFFA) module fuses features from Spatial, Fourier and Wavelet domains, achieving joint frequency-spatial representation that strengthens global dependency and structural modeling while preserving local information such as edges and textures, making it particularly effective in complex and blurred boundary scenarios. Finally, the Structural-aware Multi-scale Masking Module (SMMM) optimizes the skip connections between encoder and decoder by leveraging multi-scale context and structural saliency filtering, effectively reducing feature redundancy and improving semantic interaction quality. Working synergistically, these modules not only address the shortcomings of traditional decoders but also significantly enhance performance in high-precision tasks such as tumor segmentation and organ boundary extraction, improving both segmentation accuracy and model generalization. Experimental results demonstrate that this framework provides an efficient and practical solution for medical image segmentation.

翻译：为解决Transformer解码器在捕获边缘细节、识别局部纹理和建模空间连续性方面的局限，本文提出了一种专为医学图像分割设计的新型解码器框架，包含三个核心模块。首先，自适应交叉融合注意力（ACFA）模块将通道特征增强与空间注意力机制相结合，并在三个方向（平面、水平和垂直）引入可学习的引导，以增强对关键区域和结构方向的响应能力。其次，三重特征融合注意力（TFFA）模块融合了空间域、傅里叶域和小波域的特征，实现了联合频域-空间表示，在保留边缘和纹理等局部信息的同时，增强了全局依赖性和结构建模能力，使其在复杂和模糊边界场景中尤为有效。最后，结构感知多尺度掩码模块（SMMM）通过利用多尺度上下文和结构显著性过滤，优化了编码器与解码器之间的跳跃连接，有效减少了特征冗余并提升了语义交互质量。这些模块协同工作，不仅弥补了传统解码器的不足，还在肿瘤分割和器官边界提取等高精度任务中显著提升了性能，同时改善了分割准确性和模型泛化能力。实验结果表明，该框架为医学图像分割提供了一种高效且实用的解决方案。