Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polymer science and are expected to accelerate the development of sustainable and energy-efficient materials. Along with the competition, we release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data. We also release the data generation pipeline at https://github.com/sobinalosious/ADEPT, which simulates more than 25 properties, including thermal conductivity, radius of gyration, and density.
翻译:机器学习(ML)为发现可持续聚合物材料提供了强有力的途径,但进展一直受限于缺乏大规模、高质量且开放可访问的聚合物数据集。开放聚合物挑战赛(OPC)通过发布首个社区开发的聚合物信息学基准测试来解决这一缺口,该基准包含一个包含10,000种聚合物及5种性质的数据集:热导率、回转半径、密度、自由体积分数和玻璃化转变温度。挑战赛的核心是多任务聚合物性质预测,这是材料发现虚拟筛选流程中的关键步骤。参赛者在现实约束下开发模型,这些约束包括数据量小、标签不平衡以及异构模拟来源,并采用了基于特征的增强、迁移学习、自监督预训练和针对性集成策略等技术。竞赛还揭示了关于数据准备、分布偏移以及跨组模拟一致性的重要经验,为未来大规模聚合物数据集的最佳实践提供了指导。由此产生的模型、分析及发布的数据为聚合物科学中的分子人工智能奠定了新的基础,有望加速可持续和节能材料的开发。伴随竞赛,我们在 https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data 发布了测试数据集。我们还在 https://github.com/sobinalosious/ADEPT 发布了数据生成流水线,该流水线模拟了超过25种性质,包括热导率、回转半径和密度。