Aligning molecular sequence representations (e.g., SMILES notations) with textual descriptions is critical for applications spanning drug discovery, materials design, and automated chemical literature analysis. Existing methodologies typically treat molecular captioning (molecule-to-text) and text-based molecular design (text-to-molecule) as separate tasks, relying on supervised fine-tuning or contrastive learning pipelines. These approaches face three key limitations: (i) conventional metrics like BLEU prioritize linguistic fluency over chemical accuracy, (ii) training datasets frequently contain chemically ambiguous narratives with incomplete specifications, and (iii) independent optimization of generation directions leads to bidirectional inconsistency. To address these issues, we propose RTMol, a bidirectional alignment framework that unifies molecular captioning and text-to-SMILES generation through self-supervised round-trip learning. The framework introduces novel round-trip evaluation metrics and enables unsupervised training for molecular captioning without requiring paired molecule-text corpora. Experiments demonstrate that RTMol enhances bidirectional alignment performance by up to 47% across various LLMs, establishing an effective paradigm for joint molecule-text understanding and generation.
翻译:将分子序列表示(如SMILES符号)与文本描述对齐,对于药物发现、材料设计和自动化化学文献分析等应用至关重要。现有方法通常将分子描述生成(分子到文本)和基于文本的分子设计(文本到分子)视为独立任务,依赖于监督微调或对比学习流程。这些方法面临三个关键局限:(i)传统指标如BLEU优先考虑语言流畅性而非化学准确性;(ii)训练数据集常包含化学描述模糊且规范不完整的叙述;(iii)生成方向的独立优化导致双向不一致。为解决这些问题,我们提出RTMol——一种通过自监督循环学习统一分子描述生成与文本到SMILES生成的双向对齐框架。该框架引入了新颖的循环评估指标,并实现了无需配对分子-文本语料库的分子描述生成无监督训练。实验表明,RTMol将多种大语言模型的双向对齐性能提升高达47%,为联合分子-文本理解与生成建立了有效范式。