Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations. The final published version is now available at https://doi.org/10.1016/j.patcog.2025.112591. Our code is available at https://github.com/byzhaoAI/MCE.
翻译:多模态学习已在多种模式识别应用中取得显著进展。然而,处理缺失模态,尤其是在不平衡缺失率条件下,仍是一个主要挑战。这种不平衡会引发恶性循环:缺失率较高的模态获得更少的更新,导致学习进度不一致和表征退化,进而进一步削弱其贡献。现有方法通常侧重于全局数据集层面的平衡,往往忽视了模态效用的关键样本级变化以及特征质量退化的根本问题。我们提出了模态能力增强(MCE)来解决这些局限性。MCE包含两个协同组件:i)学习能力增强(LCE),通过引入多级因子动态平衡模态特定的学习进度;ii)表征能力增强(RCE),通过子集预测和跨模态补全任务提升特征语义和鲁棒性。在四个多模态基准上的综合评估表明,MCE在各种缺失配置下均持续优于现有最先进方法。最终发表版本现已发布于 https://doi.org/10.1016/j.patcog.2025.112591。我们的代码可在 https://github.com/byzhaoAI/MCE 获取。