开放聚合物挑战赛：赛后报告 (Open Polymer Challenge: Post-Competition Report)

from arxiv, The report for the competition: "NeurIPS - Open Polymer Prediction 2025". Kaggle Page: https://www.kaggle.com/competitions/neurips-open-polymer-prediction-2025. Website: https://open-polymer-challenge.github.io

Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polymer science and are expected to accelerate the development of sustainable and energy-efficient materials. Along with the competition, we release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data. We also release the data generation pipeline at https://github.com/sobinalosious/ADEPT, which simulates more than 25 properties, including thermal conductivity, radius of gyration, and density.

翻译：机器学习（ML）为发现可持续聚合物材料提供了强有力的途径，但进展一直受限于缺乏大规模、高质量且开放可访问的聚合物数据集。开放聚合物挑战赛（OPC）通过发布首个社区开发的聚合物信息学基准测试来解决这一缺口，该基准包含一个包含10,000种聚合物及5种性质的数据集：热导率、回转半径、密度、自由体积分数和玻璃化转变温度。挑战赛的核心是多任务聚合物性质预测，这是材料发现虚拟筛选流程中的关键步骤。参赛者在现实约束下开发模型，这些约束包括数据量小、标签不平衡以及异构模拟来源，并采用了基于特征的增强、迁移学习、自监督预训练和针对性集成策略等技术。竞赛还揭示了关于数据准备、分布偏移以及跨组模拟一致性的重要经验，为未来大规模聚合物数据集的最佳实践提供了指导。由此产生的模型、分析及发布的数据为聚合物科学中的分子人工智能奠定了新的基础，有望加速可持续和节能材料的开发。伴随竞赛，我们在 https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data 发布了测试数据集。我们还在 https://github.com/sobinalosious/ADEPT 发布了数据生成流水线，该流水线模拟了超过25种性质，包括热导率、回转半径和密度。

相关内容

数据集

关注 0

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2022】SparCL:边缘稀疏持续学习

专知会员服务

24+阅读 · 2022年9月22日

麻省理工学院林肯实验室《AI Enabling Technologies（AI使能技术）》报告，54页pdf

专知会员服务

37+阅读 · 2022年3月30日

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日