The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.
翻译:开放获取(OA)出版物的快速增长加剧了识别相关科学论文的挑战。由于隐私限制和用户交互数据访问有限,近期研究转向基于内容的推荐,仅依赖文本信息。然而,现有模型通常将论文视为非结构化文本,忽略了其语篇组织结构,从而限制了语义完整性和可解释性。为应对这些局限,我们提出OMRC-MR,一种集成问答式OMRC(目标、方法、结果、结论)摘要、多层级对比学习和结构感知重排序的分层框架,用于学术推荐。问答式摘要模块将原始论文转换为结构化且语篇一致的表示,而多层级对比目标则在元数据、章节和文档层级对齐语义表示。最终的重排序阶段通过上下文相似性校准进一步优化检索精度。在DBLP、S2ORC及新构建的Sci-OMRC数据集上的实验表明,OMRC-MR持续超越现有先进基线,在Precision@10和Recall@10上分别实现最高7.2%和3.8%的提升。附加评估证实问答式摘要能生成更连贯且事实完整的表示。总体而言,OMRC-MR为科学论文推荐提供了一个统一且可解释的基于内容范式,推动了可信赖且注重隐私的学术信息检索。